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The origin of this book probably occurred when I became a research 
assistant in the group working with James Jenkins at Minnesota in 1960. 
Our task was to demonstrate the associationist principles that we thought 
were required to explain some of the syntactic knowledge people use when 
they speak grammatically. These experiments were massively unsuccessful, 
and this fact discouraged the effort prevailing at the time of developing an 
associationist explanation for language behavior. But a kind of desperate 
attempt to show that mediated transfer could be demonstrated in our 
experimental design led to the kind of study in which Joseph Scandura and 
I later applied statistical methods based on Markov models to analyze 
transfer of training. 

From 1964 to 1966, DaPolito was conducting experiments for his disserta- 
tion at Indiana University with the intention of showing that the elegant 
experimental design and statistical methods developed by Estes could be 
applied to the analysis of forgetting of individual items. They could indeed, 
but the results were quite surprising. DaPolito expected to measure the degree 
Of interference between items from statistical dependencies between the 
Tesponses to a single stimuli. When the responses were found to be indepen- 
dent, we had an unexpected and important result to consider. 

In 1966 and 1967, Polson conducted experiments and analyses of the 
acquisition of verbally mediated concepts. He succeeded in providing a 
quantitative analysis of the process of acquiring a categorical concept, and 
his results were consistent with the hypothesis that such acquisition is an 


ix 


x Preface 


all-or-none process. Polson also was able to derive the mathematical proper- 
ties of a system in which the subject acquires general knowledge about the 
nature of the concepts to be acquired, at which time the probability of 
acquiring new concepts increases. 

James and I worked on the process of stimulus selection at Indiana in 
1964 and 1965. James’ experiments showed that the subjects stored partial 
representations of stimuli as long as they were still acquiring further items 
in the list, but acquired more complete representations during overtraining 
after all the items in the list had been learned. This seemed to implicate à 
deliberate attentional strategy in the process of stimulus selection. Allen 
Harrington's dissertation conducted in 1967 and 1968 included further evi- 
dence that stimulus encoding is a process governed by systematic attentional 
strategies involving the complete task, rather than selection of aspects of 
individual stimuli on an item-by-item basis. 

While I was visiting at Stanford University in 1966, Michael Humphreys 
and I began working on the analysis of some data he had collected earlier, 
expecting to find support for the two stages of response learning and connec- 
tion learning that most theorists assumed for the acquisition of paired 
associates. The pattern of results that was obtained involved effects on the 
first stage of both stimulus and response difficulty, and only an effect of 
stimulus difficulty on the second stage. This led to the idea that the first 
stage of learning a paired associate should be considered as a process of 
storing a representation of the stimulus-response pair, and the second stage 
as learning a reliable way of retrieving the pair on tests. 

James and I collaborated on several experiments on negative transfer in 
the A-B, A-B, paradigm from 1965 to 1967. Much of this experimental work 
was done by James at Indiana during the 1965-1966 year that | spent at 
Stanford University. The two-stage Markov analysis indicated that nearly 
all the interfering effect was in the second stage of learning, which con- 
tradicted our expectation based on the hypothesis of associative interference 
that most of the difference between negative transfer and control conditions 
would occur early in learning, when the interfering associations would be 
the strongest. 

By the time James conducted his dissertation experiment on retroactive 
forgetting at Indiana in 1967 and 1968, we finally expected to obtain results 
inconsistent with associationist hypotheses. In order to obtain a coherent 
account of James’ rather complex findings, it seems necessary to postulate 
a set of retrieval processes considerably more complex than simple retention 
of associative connections. 

James and 1 worked together at Michigan in the summer of 1969, where 
we reviewed the evidence we had obtained regarding negative transfer and 
forgetting. We prepared a draft manuscript for what we thought might be 
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a long chapter on the topic, and I used the ideas we developed there in a paper 
that I read at the American Psychological Association meetings. 

During those APA meetings in 1969, Jenkins and I happened to meet and 
had dinner together. 1 described the work that I was reviewing in my pre- 
sentation there, and I believe I said that except for a single remaining experi- 
ment, I was intending to set aside the study of paired-associate learning in 
favor of studying problem solving and mathematics learning. Jenkins won- 
dered whether I might collect the various published and unpublished studies 
of paired-associate memorizing in a book. It seemed a reasonable suggestion, 
since virtually all the writing was already done. The dissertations written by 
James, DaPolito, and Polson had not been published (though Polson's 
subsequently was), and it seemed that a book would provide both an oppor- 
tunity to present those results in a general context, and to reinterpret earlier 
findings in the light of conclusions that we had reached in the meantime. 

Of course, the job of cutting and pasting that I had imagined for compiling 
the book turned out to be insufficient. The material now in Chapters 5 
through 8, being largely taken from earlier writing—especially Polson's, 
DaPolito’s, and James’ dissertations and the draft chapter that James and I 
had written—did fit together in a simple way. But before returning to that 
part of the job, I wrote and rewrote a number of introductory sections, 
incorporating the results in Chapter 4 in various ways, and trying to fit 
together a coherent argument concerning the general issues. This was a task 
that seldom had my primary attention. I did move my main research focus to 
problem solving and mathematics learning, and maintaining that ongoing 
research program took priority over the completion of this book. It was not 
until the summer of 1974, during a sabbatical leave, that I found a period of 
time sufficient to complete a first draft. 1 am grateful to the John Simon 
Guggenheim Foundation for support during that period. A subsequent draft 
was submitted to Jenkins and Prentice-Hall, and Jenkins was helpful in 
identifying some remaining incoherencies in the presentation that I hope 
have now been improved. 


I am grateful to my co- 
long time I took to get my writing for this book done. I am also grateful to 


many other colleagues with whom I had the pleasure and good furtune to 
interact during the time this work was carried out. I am especially indebted for 
interactions concerning various aspects of the substance of this book with 
Frank Restle, Edwin Martin, and Arthur Melton, whom I count as most 
supportive and stimulating colleagues, as well as valued friends. Many other 
teachers, colleagues, and ‘students have enriched my cognitions as well, of 
Course, and I thank you all. 


authors for their patience during the inexcusably 


J.G.G. 


But association is far from being synonymous with experience. It is one way of coping 
with experience, one conception to treat experience scientifically. Therefore a criticism 
of associationism, however negative it may be, is not a rejection of a genetic theory. 
There are other, and | believe better, ways of treating experience than the concept 


of association. 


— Kurt Koffka (1935, p. 589) 


chapter 1 


In this book we present results of experimental and theoretical research on 
paired-associate memorizing, transfer, and forgetting that we carried out 
during a 15-year period beginning in the 1960s. Some of the results have not 
been published previously, or have been published only in brief summary 
form. Our interpretations of some of our previously published results have 
changed because of more recent empirical findings and theory. 

In the 1970s, the study of human memory changed dramatically. Investi- 
gators began concentrating mainly on studying ways in which knowledge is 
Organized in memory and mechanisms used to retrieve organized knowledge 
in answering questions and solving problems; they had set aside questions 
about the process of storing new components of knowledge. We, however, 
Will discuss just those questions: those that explore the way in which new 
components of knowledge are stored in memory. 

It is clear now that a theory of human memory must represent knowledge 
as а complex relational structure, with concepts and procedures linked 
through specific relations. Such a theory seems incompatible with the basic 
Premises of associationism, which provided the conceptual basis for nearly 
all the research on human verbal learning carried out in this century. It might 
be thought that a massive gap exists between current conceptualizations of 
memory and the considerable accumulation of research developed prior to 


1970, 


. We believe that the £ ў 
15, We believe that the assoc! 


ap is real and compelling at the level of theory—that 
ationist theory of memory is fundamentally in 
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error. However, the empirical phenomena of rote learning should not just be 
set aside as irrelevant to current theoretical issues. When a new theory is 
advanced, it is important to show that it can explain the phenomena that were 
formerly explained by the theory it replaces. This book takes up that part of 
the task concerning the main phenomena of paired-associate memorizing. 

The conclusion that associationist theory is mistaken was not apparent 
when we began our research in the early 1960s. In all but our latest experi- 
ments, the initial goal was to provide quantitative analysis of such processes 
as response acquisition, formation and loss of associative bonds, and facilita- 
tion of and interference with associative learning. We thought our research 
would contribute to the use of modern methods of analysis to further clarify 
and specify the nature of processes that have been generally believed to oper- 
ate in paired-associate learning, transfer, and forgetting. 

Our experimental results have led to quite a different conclusion. Instead of 
allowing us to fill in details of generally accepted theories, our results have led 
us to question the validity of those theories. In case after case, the conclusion 
has been that something was going on other than what we—and most others 
working on the problem—initially thought was going on. Our results were 
much more compatible with the Gestalt theory of association as that theory 
has been presented by such writers as Koffka (1935), Kóhler (1947), and Selz 
(1913). 

Gestalt theorists and associationists have differed in their views of the 
relationship between association and other cognitive processes. For associa- 
tionists, association is a basic process in nearly all cognitive activity, such as 
conceptual behavior (Underwood, 1952), problem solving (Maltzman, 1955), 
and perceptual learning (Postman, 1955). In Gestalt theory, association is but 
one rather peculiar special instance of more general processes of cognitive 
organization, such as grouping and differentiation. 

A conclusion that Gestalt theory is closer to the truth about paired-asso- 
ciate memorizing than associationist theory seems to us to have important 
implications. Paired-associate memorizing is the best task we have for 

studying the formation of associations in a pure and simple form. Once 
one becomes persuaded, as we have, that the cognitive processes involved in 
paired-associate memorizing are different from the processes of connection 
between elements that are central in associationist theory, then it becomes 


difficult to foresee any important use of that theory in explaining other kinds 
of cognitive activity. 


ASSOCIATIONIST THEORY AND EMPIRICISM 


| The views stated suggest that this book will have a negative tone, primarily 
aimed at showing associationist theory to be false. This is only partly true. We 
believe that our empirical results permit some significant extensions and 
elaborations of Gestalt theory, including incorporation of some modern con- 
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cepts of information processing. But to a considerable extent, we feel the 
importance of this book is in the evidence against formation of undifferen- 
tiated connections as a basic process in memorizing and learning. We present 
this evidence with a strong sense of the importance of associationism not only 
as a substantive hypothesis but also as part of the attitude of empiricism that 
has provided a revolutionary concept of human nature in the last three cen- 
turies. 

Associationism became a fully developed theory of learning in the eight- 
eenth and nineteenth centuries, when Hartley, James Mill, John Stuart Mill, 
and Bain worked out the implications of epistemological ideas introduced in 
the seventeenth century, primarily by Locke (Boring, 1950). It was natural 
and important that the associationist theory of learning be developed at that 
time, since it fit naturally with the general view of empiricism, that human 
knowledge is derived from experience. | 

Empiricism was an idea of great philosophical importance in itself, but a 
considerable part of its significance lies in its relationship with a whole Zeit- 
geist of liberalizing ideas that were developed in Europe starting with the 
Renaissance. Of course, to characterize social change in the last 600 years as 
a consistent enfranchising of ordinary persons would oversimplify history in 
the extreme. But it is not unreasonable, even for us amateurs, to note that a 
greater proportion of human beings in Western society now have significantly 
more responsibility and freedom in their political, social, moral, and intel- 
lectual affairs than they did in medieval Europe. . 

We maintain that just as the development of democratic government pro- 
vided a framework for political liberalization, and as the Reformation 
provided a framework for religious liberalization, so the development of 
empiricism and science provided a framework for intellectual liberalization. 
As long as people believed that knowledge was derived from divine revelation 
or from innate reason, the possession of knowledge was controlled as rigidly as 
the possession of material wealth. Not everyone was in a position to discover 
truth. Only persons skilled in the interpretation of scripture or disciplined in 
receptive meditation or trained in the rigors of correct argument could achieve 
new truths in systems of theological or rationalist epistemology. These skills 
and disciplines and training were not universally available: in fact, only a 
very small number of individuals were admitted to seminaries and academies 
Where they could learn how to judge whether an idea was true or false. Fur- 
thermore, the knowledge that untrained persons achieved could only be 
received from persons who had become expert and who deigned to transmit 
their wisdom. In theological or rationalist systems that were dominant, basic 
principles could only be accepted on authority. As Locke remarked in 1690 


concerning innate ideas, 


ge to those who affected to be masters and teachers, to 


it wa: small advanta| inci у 
5 of no principles,—that principles must not be questioned. For, 


make this the principle of 
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having once established this tenet, —that there are innate principles, it put their 
followers upon a necessity of receiving some doctrines as such; which was to take 
them off from the use of their own reason and judgment, and put them on believ- 
ing and taking them upon trust without further examination: in which posture of 
blind credulity, they might be more easily governed by, and made useful to some 


sort of men, who had the skill and office to principle and guide them. (Locke, 
1690/1894) 


In contrast, when it is believed that knowledge derives from experience 
rather than divine revelation or innate reason, a liberalized view of knowl- 
edge results. The source of knowledge—experience—now becomes some- 
thing that everyone has and can use; therefore, the possibility of discovering 
truth is no longer a restricted privilege. To be sure, a certain amount of train- 
ing helps a person to observe carefully and to draw correct conclusions: more- 
over, an individual with greater experience can provide useful training for a 
person of limited experience. But the differences that greater experience pro- 
duces are quantitative. In an empiricist epistemology, no person is completely 
dependent on training from others for acquiring knowledge. And a corollary 
is that every person has at least some basis for evaluating the truth of pro- 
nouncements made by alleged authorities, | 

Associationist theory played a critical role in the development of empiricist 
epistemology. According to early empiricism, all people start life with very 
simple sensory experiences. The problem with this view is that it is difficult to 
explain how complex concepts and ideas are developed. The solution given by 
early empiricists was to attribute it to the process of association. Thus, the 
formation of association was a hypothesis that contributed critically to the 
plausibility of empiricism, in that associationism provided the learning 
mechanism needed for an empiricist epistemology to work. 

The hypothesis that learning is a process of forming associations among 
impressions and ideas has enjoyed nearly doctrinal status in European and 
American psychology for three centuries. When the problem of learning was 
dealt with directly, as it was by Hartley, James Mill, John Stuart Mill, and 
Bain, the question asked was how associations are formed; not whether for- 
mation of associations is the basis of learning. In this century, functionalists 
have elaborated on the theory of association, incorporating concepts such as 
response competition (McGeoch, 1942), unlearning (Melton & Irwin, 1940), 
response acquisition (Mandler & Heinemann, 1956: Underwood & Schulz, 
1960), and response-set selection (Postman, 1963). When a theory of learning 
was needed in the analysis of another Psychological process, as with Wundt's 
analysis of illusions of similarity and contrast, Titchener’s theory of meaning, 
and Freud's analyses of neurotic symptoms, the concept used to explain 
changes in cognitive sociation between impressions or ideas. 

Behaviorists have ^ i i us-response functions using a 
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somewhat operationalized version of William James’ and Pavlov's idea of 
associations between brain processes. 


ALTERNATIVE GESTALT HYPOTH ESES 


Gestalt psychologists objected and offered an alternative hypothesis to the 
idea of undifferentiated associative connections. The Gestalt hypothesis may 
have been foreseen by John Stuart Mill, who postulated a kind of mental 
chemistry in which the compound formed by association between ideas has 
properties that are not inherent in either of the ideas when they are not asso- 
ciated. An explicit alternative view was formulated by Selz in 1913. The 
Gestalt theory of association was developed in fairly extensive discussions by 
Köhler in 1929, Koffka in 1935, and Katona (1940), who carried out an exten- 


sive series of empirical studies. 
Selz identified the critical hypothesis of associationist theory, which he 


called the hypothesis of diffuse reproductions—the idea that information is 
stored in the form of undifferentiated connections and that retrieval of a 
response is determined solely by the combined strength of connections leading 
to the response. In place of this idea, Selz proposed what he called a hypothe- 


sis of specific responses. The substance of this hypothesis is illustrated in the 


following example, which deals with performance in a task of giving generic 


concepts as associations to a set of stimulus words. Suppose that when the 
stimulus is "farmer," a subject gives the response, "occupation." Selz said, 


and the relevant stimulus word "farmer" cannot 
lation, but rather that they act like the coherent 


question "What is the generic concept for farmer?" This question of the experi- 
menter already anticipates schematically the knowledge-unit (or structure) 
“Farmer is an occupation” which the subject has previously acquired. The ques- 
tion contains one member (A) of the known facts of the case and the relation (y) 
to the other, sought-for member; in this case the relation is of species to genus. 
The question can, therefore, act as an eliciting condition for the intellectual oper- 
ating of knowledge-production [Wissensak tualisierung], whereby the uncom- 
pleted knowledge-unit (A y) which the question represents 15 completed by 
restoring the reproductive unit (AyB). Instead of a diffuse play of competing 
reproductive tendencies, this theory offers à comprehensive process wherein the 
question acts as a unitary total task along with a uniquely relevant operation of 
knowledge-production. This operation can be shown to be a special case of 
structure-completion, since the fragmentary structure of the question is made 
complete by the operation of knowledge-production. (Selz, 1964, pp. 227-228) 


... the task "generic concept" 
be treated as factors acting in 180 


nt of Selz’ hypothesis of specific responses 
al structures, not undifferentiated connec- 
ed that the cognitive structure produced 
le memory trace, representing the asso- 


, For our purposes, the main poi 
is that knowledge units are relation 
tions. Kóhler (1947) similarly argu 
by learning an association is a sing 
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ciated elements as “relatively segregated subunits,” rather than as two indi- 
vidual traces connected by a bond. 

Kohler also presented an objection to the associationist view that ideas 
become connected merely by occurring together. In no other branch of 
Science, Kóhler pointed out, do we assume that Objects or events become 
functionally related regardless of their individual properties. For example, 
"in chemistry, atoms react or remain indifferent to each other depending upon 
their given characteristics" (Kóhler, 1947, p. 153). 

The view taken by Kóhler was that an association is learned when the ele- 
ments to be associated are organized into a unit, so as to form a unitary trace 
of the experience that incorporates both elements. He said, 


+++ contiguity in space and time favors association only because, under the name 
of proximity, it is a favorable factor in organization. Now, this condition is just 
one among many others which all have a favorable influence on organization, 
and since it now appears that organization is the really decisive condition of what 
is commonly called association, the rule of association may have to be refor- 
mulated accordingly. (Kóhler, 1947, p. 158) 


Kóhler discussed recall primarily in terms of attitudinal factors. He also 
remarked that recall depends on a compromise. On the one hand, recall of 
an association presupposes a stable organization of the associated elements, 
that is, a unitary trace. On the other hand, if one associated element is pre 
sented, it will not produce recall if it is too thoroughly absorbed in the organi- 
zation of the trace that constitutes the association in memory. Recall—or as 
we will say, retrieval—thus requires that "the process which is now given 


resembles some region within the organized trace of the whole experience" 
(Kóhler, 1947, p. 17). 


Koffka (1935) also took the v 
tion, with an association bein 
both of the associated item 
learning by focusing attentio: 


iew that learning is a process of organiza- 
g the formation of a unitary trace representing 
5. Koffka strengthened the Gestalt analysis of 
n on the concept of a trace's stability. He postu- 
lated “that a trace will influence a process in such a way that the reactive 
influence exerted by the process on the trace will not diminish, but if possible 
increase, the latter’s stability” (Koffka, 1935, p. 563). 

In addition to the implications of this idea for Processes of acquisition, 
the postulate provides the basis of a theory of retrieval. An implication of the 
postulate is that “if a Process communicates with part of a trace system, the 
whole trace system will exert a force on the process in the direction of making 


it as complete as it was When it created the trace System" (Koffka, 1935, p. 
567). This implication fo 
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he whole trace system becomes active, 


segregated subunit”) becomes active, t 
heir stability. Retrieval of associations 


because trace systems tend to maintain t 
is therefore a process of redintegration. 


GESTALT THEORY AND EMPIRICISM 


y like Locke's is taken as a starting point, the 


If an empiricist epistemolog 
ptable. Locke's empiricism 


Gestalt theory of association is simply unacce t 
Tequires a theory of learning that can start from a blank tablet; association- 
ism asserts that concepts and other complex ideas are formed out of 
elementary impressions. But Gestalt theory claims that associations are pro- 
duced by cognitive organization. The concepts and relationships that are the 
basis of grouping and differentiation among objects are also the basis of asso- 
ciation. If the views of Selz, Kóhler, and Koffka are accepted, we must reject 
the important epistemological idea that knowledge derives entirely from 


experience. 

If Locke’s empiricism were the only kind possible, the acceptance of a 
Gestalt theory of association would be paradoxical. Our preference for an 
extension of Selz’, Kóhler's and Koffka's hypotheses over those of McGeoch, 

н dence. We have been led 


Melton, Underwood, and Postman is based on eviden 
to a view that opposes our initial hypotheses by the evidence of our observa- 


tions—that is, by our experience. It would be paradoxical if this event forced 


Us to give up the view that knowledge is based on experience. 
ence leads to knowledge may be 


On the other hand, the way that experi ids | 
different from that given in Locke’s kind of empiricism. An alternate for- 
mulation has been given by Popper (1935, 1959) in which evidence is used not 
eories by showing that hypothe- 


to support hypotheses but rather to correct th | 
Ses are false. In an epistemology like Роррег 5 concepts and relationships are 
not induced from experience. Rather, à person's present beliefs form the basis 


of expectation about what the person will experience. When experience is 
incompatible with expectations, the person can then change beliefs, moving, 
One would hope, closer to the truth. Popper 5 philosophical arguments seem 
Consistent with Piaget’s general hypotheses about cognitive development, in 
that experience that does not fit with present schemata can lead to an accom- 
modation of cognitive structure that gives а more adequate basis for interact- 


ing with the environment. даб ; : 
Popper’s idea of falsification and Piaget's concept ur accommodation ps 
a theory of knowledge that emphasizes observation and experience as much 
às the older empiricist view. Concepts and ideas are not derived initially from 
experience—the mind is not originally à blank tablet. But concepts and ideas 
must be checked against experience. And most important, when people 
heir disagreement can be settled, if it can 


disagree about the way things аге, t сав i eH 
be settled at all, by testing their respective views against empirical observa- 
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tion. Experience is the court of last resort in resolving differences of opinion, 
and arcane texts or authoritarian pronouncements about what is “reasonable 
do not provide valid resolutions of disputes. 


EMPIRICISM AND LIBERALISM 


If the empiricist philosophy of knowledge is modified to permit innate 
concepts and ideas of relations, one must ask whether empiricism loses its 
effectiveness as a libertarian viewpoint. In at least one respect, its liberalizing 
effect is strengthened. Persons with governmental power, academic d 
and other positions of institutional authority tend to have an advantage o 
experience and access to evidence. If knowledge-claims are valid to the extent 
that they are supported by evidence, then a person with institutional authority 
is in a strong position, because opposition almost always rests on a Mops 
quantity of data than that which can be collected in support of an officia 
view. —— 

But if experience is mainly a corrective factor, the advantage of institutiona 
authority is greatly reduced. The empirical status of a theory depends less um 
the quantity of evidence supporting the theory than on the quality of that 
evidence, especially as it relates to competing alternative theories. An estab- 
lished theory can be disproven by documenting a single falsifying fact. This 
implies a considerable advantage to the dissenter, A person does not need a 
lifetime of experience nor access to all the accumulated data to develop effec- 
tive opposition to the prevailing view. If a weakness of assumption in the 
established theory is understood, then it is reasonable to expect that an item 
of falsifying evidence can be discovered and documented. When this is done, 
the established view is shown to be false, 
consistent with that view. Thus, while classical empiricism was a liberalizing 
idea regarding knowledge, the view that evidence serves mainly as a corrective 
of false opinions leads to a further liberalization by showing the extent to 
which all persons have access to the basis of knowledge and dissent. 


regardless of the quantity of evidence 


SUMMARY 


We will present evidence that we think 
monly held views about the process of lear 
view that associations are etched by 


argues convincingly against com- 
ning associations. Rather than the 
experience on a blank tablet, we are led 
to a hypothesis that association is a form of cognitive organization, depending 
on relational ideas that the learner already has in cognitive structure. 
Because of the central role of associationism in cl 
recognize that our evidence and the conclusions we draw from it tend to 
undercut a widely held version of the empiricist theory, 
we develop about learning are consistent with 
which experience is used mainly to correct f 


assical empiricism, we 


However, the views 
à reconstructed empiricism in 
false opinions. This revised empir- 
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icism is reasonable and strengthens the libertarian implications of empiricism 
regarding accessibility of the basis of knowledge. Our main interest in this 
book is scientific: the central issue is the truth of alternative hypotheses about 
are acquired in experiments. We believe that our 
early correct hypothesis about the nature of simple 
become connected. But 


how simple associations 
evidence leads to a more n 
associative learning than the view that ideas simply 
we are pleased that it also encourages an epistemology that seems to us more 
nearly correct and. more strongly libertarian than the philosophical views 


in which associationism was originally embedded. 


chapter 2 


The central claim in Gestalt theory about association is that it is pu 
one form of cognitive organization. Organization is fundamental, although 1 
occurs in various ways. One outcome of Organization can be association 
between two elements into a new Cognitive unit. 


In this chapter, we first briefly review two groups of theories that consider 
general principles of organization. 


5. Quillian recognized that 
meanings is like a network, consistin 


resent this knowledge, Quillian used 


our knowledge about word 


g of relations among concepts. To rep- 
five types of connections: 


1. category membership; for example, “tree” is in the category “plant,” and 
“plant” is in the category “structure,” 


10 
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2. modification; for example, a plant is a structure of a particular kind, namely, 
a living structure. 

disjunction; the relation between members of a set of alternatives. 
conjunction; the relation between members of a set of requirements. 
connections between relations and their arguments; for example, the informa- 
tion that people use machines has the relation “use” connected to “people” 
and “machine” through this relation. The system must distinguish different 
forms of this kind of connection, since many relations are not symmetric. 


“ве 


In addition to the connections, Ош ап 5 system uses quantitative modi- 
fiers to represent information such as plants frequently have leaves, or plants 
are not animals. 

An example of a know 
model is given in Figure 2- 


ledge structure that can be constructed in Quillian’s 
1. The diagram represents the information for the 


first dictionary definition for “plant”: Living structure that is not an animal, 
frequently with leaves, getting its food from air, water, or earth. The diagram 
shows plants as members of the category “structure,” modified by a conjunc- 
tion of four concepts: they are living; they are not animal; they frequently 


(1) 


STRUCTURE 


(2) 


(4) 


LIVE 


[NOT] 
ANIMAL 
(FREQUENTLY 
WITH GET 
је (5) 
EAF 
H FROM 
(5) 
FOOD 
AIR 
(3) 
WATER 
EARTH 


Figure 2-1 Knowledge structure for one meaning of “plant,” in 


Quillian’s (1968) model. 
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have the relation "with" to leaf: and they have the relation "get" to a struc- 
ture consisting of the relation "from" between food and a disjunction of air, 
water, or earth. 

Anderson and Bower's (1973) model, Human Associative Memory (HAM) 
uses à somewhat different set of relational connections. The information 
encoded in HAM represents propositions. To represent a proposition, HAM 
constructs a tree in which nodes designate components of the proposition, 
and labeled connections indicate what relation each node has with other 
nodes and with the proposition as a whole. One node represents the propo- 
sition. A second optional node, representing a context, can be included. A 
third node represents the fact —something that occurred. Related to the node 
for context can be nodes representing location and time. Related to the node 
for a fact are nodes representing subject and predicate, and related to a predi- 
cate can be nodes representing relation and object. HAM also provides 
quantitative relations, specifically, set membership, set inclusion, and the 
universal quantifier. 

Figure 2-2 gives an example of a knowledge structure that HAM can con- 
struct. The proposition represented here would be expressed by the sentence. 
"A hippie touched a debutante in the park at night." The topmost node rep- 
resents the proposition. There is context (C), consisting of a location (L). 
some park, and a time (T), an unspecified night. The fact (F) includes a sub- 
ject (S), a hippie, and a predicate (P), which designates a relation (R), touch- 
ing, and an object (O), a debutante. 


Still another model, by Norman and Rumelhart (1975), develops the rep- 


touch debutante 
Figure 2.2 Represen 


tation of “A hippie touched a debutante in the 
Park at night," in HAM 


(Anderson & Bower, 1973). 
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resentation of propositions in somewhat greater detail, using many semantic 
relations of the kind analyzed by Fillmore (1968). The central nodes represent 
actions or other kinds of relations specified by verbs. These relational terms 
are connected to other relations in ways that are specified in the structure. 
The example shown in Figure 2-3 corresponds to the sentence, “Peter put 
the package on the table.” The upper part of the diagram indicates that some- 
one named Peter put something that is a package onto a location that is the 
top of something that is a table. The lower part of the diagram represents 
general knowledge about the meaning of “put.” Norman and Rumelhart's 
model uses this kind of general knowledge in processing the information 
presented in sentence form. "Put" refers to an event in which some action, 
performed by an agent, causes a change in location of some object from one 
place to another. Thus, hearing the sentence, "Peter put the package on the 


referent 


«102» — > TABLE 


p 


object 


<ж101> 


PETER 
у PACKAGE 
location 
name а 
t 
TEE LECT 


from-state 


object 


at-location at-location 


<unknown> 


Figure 2-3 Representation of “Peter put the package on the table,” 


in Norman and Rumelhart's (1975) model. 


14 Organization and Association 


table," the system fills in specific components in various slots that are speci- 
fied in the schema that corresponds to knowing what "put" means. 

The propositional structure corresponding to various sentences and para- 
graphs has been studied extensively by Kintsch (1974). Two analyses that 
were used in an experiment by Kintsch and Keenan (1973) are shown in 
Figure 2-4. The information in each sentence appears as a set of proposi- 
tions—four propositions in the sentence about Romulus, and eight proposi- 
tions in the sentence about Cleopatra. Each proposition specifies a relation, 
named by the first word in parentheses, and one or more ideas that are con- 
nected to the relation in the proposition. For example, the proposition indi- 
cated by “TOOK, ROMULUS, WOMEN, BY FORCE” has “TOOK” as 
the relation (in HAM we would find a relation “TAKE,” with “PAST” as 
part of the context). “ROMULUS” could be identified as the agent of the 
relation, “WOMEN” as its object, and “FORCE” as its instrument. In the 
case of propositions such as “LEGENDARY, ROMULUS” and “SABINE, 
WOMEN”, the first terms are properties, rather than relations, and modify 
the respective second terms. | 

In all the theories reviewed here, information is stored in memory in the 
form of relational structures and the information is relational at two levels. 
The first level includes such relations as category membership and subset 
relations, as well as grammatical relations such as subject and predicate, and 
case relations such as agent and object. At another level, much of the stored 
information involves relations that are named by the system and treated as 


Romulus, the legendary founder of Rome, 
by force. 

1. (TOOK, ROMULUS, WOMEN, BY FORCE) 
2. (FOUND, ROMULUS, ROME) 

3. (LEGENDARY, ROMULUS) 

4. (SABINE, WOMEN) 


took the women of the Sabine 


Cleopatra's downfall la 
of the Roman world. 
(BECAUSE, о, p) 
(FELL DOWN, CLEOPATRA) = « 
(TRUST, CLEOPATRA, FIGURES) = В 
(FOOLISH, TRUST) 

(FICKLE, FIGURES) 

(PART OF, FIGURES, WORLD) 
(ROMAN, WORLD) 


У in her foolish trust in the fickle political figures 


oNnNoPWON> 


Figure 2-4 Propositional representations of two sentences (Kintsch & Keenan, 1973)- 
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concepts, connected to other concepts in the network. In Kintsch’s analyses, in 
fact, the structural details are suppressed in order to focus attention on the 


properties and relations specified in propositions. 
For our purpose, the major conclusion to be drawn is that storing of 


relatively complex relational structures occurs naturally and frequently in 
ordinary human cognitive activity. This does not imply that the theories 
reviewed here give a complete and accurate explanation of such processes as 
language comprehension. On the contrary, the task of explaining such pro- 
cesses is far from complete and much is still not understood. However, it 
seems very likely that persons store most of their knowledge in a form similar 
to that specified in these theories. 


THEORY OF PATTERN RECOGNITION 


Another group of theories deals with the organization of information used 
in identifying patterns. For several years it has been generally agreed that the 
Process of recognizing a familiar pattern such as a letter or word, or a friend’s 
face, involves analyzing the features of the stimulus pattern, rather than com- 
paring the stimulus with a stored template of the pattern (see Neisser, 1967). 

One system that recognizes patterns by analyzing features is a program 
called Pandemonium (Selfridge, 1959). Pandemonium begins with a set of 
feature detectors, each of which is set to respond if a specific feature is present 
in the stimulus. Some, for example, detect oblique lines, others detect horizon- 
tal lines, and still others, curved lines. When the stimulus is the letter A, 
the detectors for oblique lines and a horizontal line respond, and the detectors 
for curved lines are silent. At the next level are cognitive demons, that listen 
to the responses of the feature detectors. Each pattern that the system can 
Tecognize has a cognitive demon that corresponds to a set of features. For 
example, the demon for A could have these features: two oblique lines, a 
horizontal line, three acute angles, and two obtuse angles. Each cognitive 
demon listens and responds to the feature demons corresponding to those 
included in its set of features. If all the feature detectors corresponding to the 
Cognitive demon's features are responding, and no other feature detectors 
are responding, then that cognitive demon responds at full strength. If one 
Or more features in a cognitive demon’s set are absent, or if features are 
Present that are not in the demon’s set, that demon's response will be reduced. 
The system makes a final decision about the pattern by comparing the 
Strengths of response of the various cognitive demons, choosing the pattern 
"hat corresponds to the demon who is shouting the loudest. 

Pandemonium, true to its name, has its knowledge of patterns organized 
very weakly. Each pattern that can be recognized is represented by a unit of 


the system, constituting a “bundle” of features. 
Systems with more structure have also been developed, which recognize 
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patterns by an analysis that uses a decision network. Two systems that ira 
ор network structures for pattern recognition are Concept Learning у de 
(CLS), developed for learning categorical concepts (Hunt, Marin & Lope 
1966), and Elementary Perceiver and Memorizer (ЕРАМ), developed as ғ 
theory of verbal learning (Feigenbaum, 1963). 

Hunt's CLS includes features in its decision net for a concept based on a 
sample of stimuli that are designated as positive and negative instances a 
concept. As an example, suppose a category is defined by the following cm 
bination of properties: triangle, and either red in color or having a striped 
border. Several stimuli are shown, and for each stimulus it is indicated whether 
the stimulus is or is not an instance of the concept. All positive агн 
of the concept will be triangles with striped borders or red in color or = а 
Negative instances will be triangles with neither striped borders nor red wore 
and any figure that is not a triangle. Note that any nontriangular figure i 
either a striped border or red color or both is also a negative instance. 4 
comparing sets of negative and positive instances, CLS can arrive at a cone 
way of classifying the stimuli, showing the result in the form of a decision Um 
Figure 2-5 illustrates the example discussed here. Note that the information P 
à structure of features, connected by links that are labeled either "yes" or "no, 
which are the two possible outcomes of tests that can be applied to a stimulus. 


o 


STRIPED 
BORDER? 


Figure 2-5 Representati 
either red in color or havin 
& Stone, 1966) 


on of a categorical concept, "Triangle, and 
9 а striped border," in CLS (Hunt, Marin 
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When CLS has learned a concept, it has a structure that uses feature detectors, 
just as Pandemonium does. However, the feature tests arecarried out in a serial 
order, and the outcome of each test determines what other tests will be carried 
out. This interdependence among tests means that some stimuli can be classi- 
fied more quickly than others. For example, in Figure 2-5, a negative in- 
stance that is not a triangle would be rejected more quickly than a triangle 
that has neither red color nor a striped border. In experimental tests of dif- 
ferences predicted between time to classify various stimuli the results have 
been in agreement. with predictions of the kind illustrated here (Trabasso, 
Rollins & Schaughnessy, 1971). 

Another model, designed to simulate rote verbal learning, is EPAM 
(Feigenbaum, 1963). EPAM constructs a recognition network that identifies 
the items in a list of nonsense syllables that might be presented in a paired- 
associate or serial learning experiment. The main process carried out in the 
System is discrimination learning. The network that EPAM constructs in- 
cludes a set of features that allows the system to differentiate each item from 
Other items in the list. 

When a list of associations has been learned by EPAM, the system has 
stored a structure of tests and branches. Each test examines a stimulus attri- 
bute, such as the identity of a letter or a phonemic or graphemic feature. The 
Outcome of each test determines which feature will be tested next. The dis- 
crimination net that is acquired is detailed enough so that each stimulus and 
сасћ response gives a unique pattern of outcomes of the tests included in the 
net. Then, when a stimulus is presented, a series of tests occurs leading to a 
unique terminal node. At that node is stored a (usually partial) image of the 
stimulus. 

Each stimulus image has stored with it information about the response 
Paired with that stimulus. This information, called a cue, comprises a partial 
list of the properties of the appropriate response. When a stimulus has been 
Presented and the terminal node for that stimulus has been reached, the 
response cue stored is used as another entry in the discrimination net, and 
the features specified in the cue are examined in another pass through the 
Network. This process should end at a terminal node containing a response 
image, which must be detailed enough to permit production of the response. 

Acquisition of the discrimination net involves adding new test. nodes 
Whenever the present net is inadequate for correct performance in the paired- 
associate task. This occurs when the series of tests on a stimulus fails to lead 
toa response cue, or when the response cue that is obtained leads to a termi- 
nal node without a response image or with an incorrect image. When this 
happens, one or more new nodes are added to the net, so that the stimulus 
and response will be represented appropriately at terminal nodes and will be 


distinguished from other terms already included in the net. 
Figure 2-6 illustrates the kind of net EPA M builds (Feigenbaum, 1963). At 
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FIRST LETTER 
HAS ё? 


FIRST LETTER 
PLOSIVE? 


THIRD LETTER 
GLOTTAL? 


Figure 2-6 Discrimination network for two paired associates 
(Feigenbaum, 1963). 


the point shown, the experiment has included two pairs of nonsense trigrams: 
DAX—JIR and PIB—JUK. In the example, EPAM created a test to dis- 
criminate the two terms of the first pair. This first test examines a feature of 
the first letter; it could be a visual feature such as "curved line at right,” OF 
a phonemic feature such as ё present in the sound of the letter’s name. The 
first test gives a positive result when the stimulus DA Х is presented; however; 
this test does not discriminate Р/В from DA X. A positive result is also ob- 
tained on the first test for Р/В. Therefore, a second test is needed. The second 
test uses an additional feature of the first letter—if the features being tested 


are phonemic, the second test might be the feature “plosive.” Then P/B gives 
positive results for both tests while DAX gives a p 


test but a negative one on the second. 


Both response terms in this example sort down the negative side from the 
first test; neither response passes a test such as the one Гог гіп the name of the 
tter. To discriminate the two responses, a letter different from the first 
letter must be tested. A third test could use a feature of the last letter that 
would Separate K from R. The terminal nodes for stimuli contain cue infor- 
mation used to identify responses. When the terminal marked P is reached: 
the system must find information that results in a second pass that will termi- 


nate at JUK, and similarly the system needs to store information with D that 
will result in a sort to JIR. 


ositive result for the first 


le 


Organization and Association 19 


An important theoretical study was conducted by Hintzman (1968), using 
the idea of a discrimination net similar to that used by Feigenbaum. Hintz- 
man’s program, called Stimulus and Association Learner (SAL). is simpler 
than EPAM in some ways. Instead of storing a series of tests to identify each 
response, SAL stores only the response item at the terminal node of the tree 
used to sort stimuli. This means 2222 SAL cannot be used to investigate ideas 
about response acquisition, and serial learning cannot be accomplished by 
SAL, as it can be by EPAM. On the other hand, this simplification provides 


an wished to investigate the extent to which a 


an be used to explain a variety of phenom- 
d to simulate learning involves 
4 complicated set of processes. ult to determine ан erin 
Several processes or their interactions are important in producing ria ог- 
Mance of the system, and theoretical inference 15 therefore more di ee к 

As with ЕРАМ, learning in SAL is a process of adding — oa 
discrimination net. However, SAL includes stochastic paiameters: ; en 
ЕРАМ alw ays adds a test node whenever an error occurs, ари does у eae 
Probability а. This means that there are trials when SAL's € is - 
ferent from. the one called correct by the experimenter, but i aw 
Change its way of sorting the stimulus. In these сай, dar ae — 
Correct by the experimenter on that trial replaces SAL и Н hnc 
With probability b. The basic mechanism used by SAL exp : vis ed 
Able variety of experimental facts. such as the effect Lia ee 
Stimuli in the list being learned, and effect of the number о 2 е 
Used. The use of stochastic processes enabled Натан : kd den 
*Xperiments, obtaining several different sequences ОГАЕ ње variability 
used and getting information about the amount of intersud) a 


that SAL produces. 

„Го deal with some additional p 
died, In SAL II, when a correct respon 
ап additional test node will be formed 

I to produce overlearning and explains 


a theoretical advantage. Hintzm 
Process of stimulus discrimination ¢ 
If the system use 


ena in associative learning. sed 
it is more diflic 


henomena, two other processes were stu- 
se is given there is а probability c that 
for the item. This process causes SAL 
some effects due to amount E prac- 
lice ang similarity between stimuli in different lists in nang 7 = a 
Finally, in SAL III there is a push-down stack sin i tta discit 
Recall that when an error occurs there is probability | опзе will be stored at 
Nation net will not be changed but that the correct "y n the new n" onse 
ise terminal node reached in the series tae рака is oi Lime 
aced | — a ctack: responses 4 gus 

еу lawer at nues + can & cem of change over е ше ннн гине a 

uring which responses from lower positions in the а ел ex ^ 
Place responses that arrived in the stack more recently: ~~ ics : a 8 
Proactive interference effects. An explanation of 80018 ES ет рави 5 = 
lype of lest given, such as the difference between recognition and recall, was 
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obtained by assuming that SAL III scans the complete response stack at a 
terminal node when it is appropriate to do so. 

In addition to finding that the idea of stimulus discrimination as realized 
in a discrimination net has explanatory power for a very large set of experi- 
mental phenomena, Hintzman also discovered that a number of phenomena 
probably depend on additional processes. Effects of list length cannot be 
explained with the discrimination process of SAL and may be related to 
limitations of short-term memory. Also, SAL provides no explanation of 
negative transfer, and it may be necessary to postulate storage of list-iden- 
tifying information to explain transfer effects, although EPAM can explain 
negative transfer in A-B, A-C by postulating that storage of new response 
images requires a large amount of time. , 

An application of EPAM has been developed by Simon and Gilmartin 
(1973) for a task involving retrieval of information from a very large struc- 
ture of stored knowledge. Simon and Gilmartin gave an analysis of pattern 
recognition in chess. It is known (Chase & Simon, 1973: deGroot, 1966) that 
chess masters have exceptional ability in a task requiring fast encoding of 
complex information about a chess position. If a board is arranged with pieces 
from a position that could be reached during a game, then shown to a master 
player for a few seconds, the player often can accurately replace nearly all 
the pieces and generally can accurately replace as many as 15-18 pieces—well 
beyond the number of discrete elements that a person can hold in short-term 
memory. Simon and Gilmartin explained this ability on the basis that the 
master player perceives patterns rather than individual pieces, and recognizes 
the patterns by using information stored in the form of an EPAM net. Simon 


and Gilmartin's system, called MAPP (for Memory-Aided Pattern Perceiver) 
learned somewhat fewer than 600 patterns, each with 


three pieces, with the information Stored in a network 
fewer than 2000 nodes. This memory store was then used to simulate the 
performance of a subject whose task was to reproduce a sample of boar 

positions after brief. viewing. MAPP succeeded in reproducing about 50% 9 
the pieces in the positions used. This was not as good as the performance ofa 
chess master, who replaced an average of 74 %, but was somewhat better 
than performance of a Class A player, who replaced an average of 43%: It 
was estimated that a system organized like MAPP could perform at the levè 
of the chess master if it had a network permitting recognition of some thou” 
sands, or perhaps tens of thousands of patterns. 


an average of two tO 
containing somewhat 


ASSOCIATION AND STRUCTURE 


Any theory that describes an organized system specifies components of the 
system and relations among the components. Thus, all theories about orga?!” 
zation must be about association, at least in a general sense. However, thë 
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theory we are calling the associationist theory has made a specific claim 
regarding the organization of memory—that is, that organization of knowl- 
edge consists of a network of simple connections between ideas, or between 
stimuli and responses. Connections have been assumed to vary in strength, 
but only in strength. In other words, associationist theory states that memory 
can be represented as a graph in which ideas, or stimuli and responses, are 
denoted by nodes, and the connections between nodes vary only in strength. 

Theories about memory structure of the kind that we have reviewed here 
differ from the traditional associationist theory in one fundamental way: 
they assume that associations in memory are of different kinds. Theories of 
semantic and factual memory specify different kinds of relations among com- 
ponents of knowledge, and concepts are associated because they enter into 
those relations. In theories of pattern recognition, the various component 
features of concepts are linked by branches that depend on the features found 
in the stimulus.! 

It seems to us that there are two views that an associationist might reason- 
ably take regarding current theories about the organization of memory. One 
is simple skepticism. It might be argued that the structures and mechanisms 
assumed in recent theories are wrong—or at least unnecessarily complex. 
Such an argument would be tantamount to belief that a theory based on 
Simpler principles would be closer to the truth. We disagree with this view; 
in our judgment the weight of evidence strongly favors theories whose basic 
Principles are relatively more complex. But we will not try to support that 
&eneral position here. The task of developing and evaluating a general theory 
of human memory is not in the domain of this book. 

The second response available to an associationist is to maintain that the 
basic mechanism of learning is the formation of simple connections, and that 
the more complex structures found in semantic and factual knowledge, in 
Complex pattern recognition, and in other complex cognitive domains involve 
Complex combinations of these basic connections. According to this view, it 
Could be accepted that analysis of knowledge structures seems to lead to the 
idea of different kinds of associations corresponding to various relations 
àmong components, but this diversity is only apparent. It would be expected 


'Pattern-recognition systems like Pandemonium, CLS, and EPAM can actually be 
if the systems are assumed to receive 
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Concepts (e.g., Huesmann & Cheng, : | 
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that an appropriate analysis would show how these complex relations are 
composed of simple connections that vary only in strength. 

It is the second associationist response that this book attempts to dis- 
courage. According to an associationist view, formation of connections is 
simple: understanding relations is complex. We take the opposite view: 
relating cognitive elements is basic and simple, but learning associations be- 
tween apparently unrelated elements involves a relatively more complex 
process that can best be understood as a composition of processes. Various 
aspects of building a relational structure are the component processes in asso- 
ciative learning— not the other w ay around, 


NATURE OF ASSOCIATIONS 


The central assertion in the theory we are presenting is: Formation of an 
association consists of storing in memory a new structure in which the asso- 
ciated elements are in some way related. As Bower ( 1972a) and Köhler (1947) 
indicated, the relation may be only that the two elements occurred together 
in the same context. However, mere proximity in place and time provides 
relatively weak relational connection: stronger learning should occur if there 
is some more substantial basis for relating the elements. 

Some experiments have compared associations with terms that could be 
related easily to others that seem less easy to relate. According to the hypothe- 
sis that associative formation is basically a process of finding relations, asso- 
ciations should be easier to memorize when there is a stronger basis for relat- 
ing the items. 

In an early contribution to the Gestalt theory of association, Kóhler (1941) 
showed that, in a list of items with some numeric terms and some alphabetic 
terms, learning was faster on pairs with both stimulus and response terms 
from the same class than on pairs with the stimulus from one class and the 
response from the other. Kóhler presented this result as an illustration of the 
importance of organization in learning, citing the Gestalt principle of group- 
ing by similarity as the probable basis of the effect. 

However, further analysis shows that effects of this kind are open to an- 
other interpretation. Postman and Riley (1957) responded to Kóhler's argu- 
ment, noting that before learning an item, subjects could be selecting 
responses in a biased way, preferring responses that are in the same class as 
the stimulus, and they presented evidence that such biases exist. On Postman 
and Riley's hypothesis, Kóhler's finding was due not to differential ease of 
learning the similar and dissimilar items, but to differences in guessing biases 
prior to learning that produced more correct responses to unlearned similar 
items than to unlearned dissimilar items. 


Data that seem to avoid Postman an 


d Riley's objection were presented by 
Asch (1969), who manipul 


ated the conditions of Presentation of items rather 
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than the properties of items themselves. Asch studied the association be- 
tween properties, such as a form having the shape of a triangle, a line being 
composed of small circles, and other features of geometric stimuli. In one 
condition, Asch presented the properties to be associated as features of single 
objects. An example would be a triangular shape formed by three lines of 
small circles. In the other condition, the properties were presented but not 
integrated in that way. A triangle made of ordinary lines was presented beside 
a line composed of small circles. Asch's idea is that when two properties are 
features of the same object, they will be easily organized into an integrated 
unit, but that when the properties are presented side by side, it would be more 
difficult to do so. In tests of retention Asch obtained large differences favoring 
the condition with integrated presentation of properties. 

An associationist analysis of this kind of effect probably is possible, but at 
least the argument of response bias prior to learning does not seem to apply. 
Asch's demonstration is especially interesting in that it relates directly to one 
of the original uses of the concept of association—explanation of the devel- 
Opment of complex concepts (such as "table") based on association between 
Properties that are experienced together. The simplest interpretation of Asch's 
result is that when we learn that an object has several properties, we learn it 
partly because the properties are all parts of the same object, and are there- 
fore experienced in an integrated and unified way. But on this interpretation, 
Principles of cognitive organization (in this case, perceptual principles) are 
Used to explain an aspect of associative learning, У hereas the associationists 
Were attempting to explain cognitive organization using principles of asso- 
ciative learning. E 

In many ways, the most compelling demonstrations of the potency of 
Organizational factors in associative learning come from experiments dealing 
ategies. An especially strong demonstration was 
3), studying associative learning by adult 


with elaborative coding str 
given by Jensen and Rohwer (196 
Tetardates, The task involved associating pictures of objects like a hat and a 
table. When опе of the pictures was presented (say, the hat) the subject was 
to select a lever under the picture of the associated picture (the table) from a 
Set of pictures. Jensen and Rohwer reported that many of their subjects had 
great difficulty in learning the pairings, sometimes working for many daily 
Sessions without making noticeable progres But а dramatic change in 
learning occurred when subjects were instructed to form sentences involving 
the objects. (For example, “The hat is on the table.") When subjects were 
taught to use this strategy, learning occurred at a reasonable rate. 

ui ations have shown that subjects can be great- 
use of appropriate kinds of cognitive 


Many experimental demonstr 
ly aided in memorizing associations by 
additions to the material being memorized. The material added by the sub- 
Ject may be verbal, such as the sentences used by Jensen and Rohwer's sub- 
Jects (also see Adams, 1967), or it may be pictorial (see Bower, 1972b; Paivio, 
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1971). Evidence from experiments indicates that the main use ofa pictorial 
image comes in integrating the elements to be associated as parts of an inter- 
acting scene, thus performing an organizational function similar to that 
discussed by Asch (see Bower, 1970; Wollen & Lowry, 1971). 

Evidence of strong facilitation based on elabo 
importance of organization in associative learning. In many situations where 
elaborative encoding helps, it would seem that simple associative learning 
should be easier without it. Suppose that a subject is to learn to associate the 
words "wheel" and "pencil." Learning is made easier by using an image of @ 
wheel with spokes made of pencils. But in terms of associative connections, 
the subject who learns by using imagery has associated the pictorial represen- 
tation with the stimulus (for example, imagining a wheel pictorially 10 
response to reading the word “wheel”) and the response with the image (say 
"pencil" in response to the pictorial image). Why should this be easier than 
simply connecting the word "pencil" with the word "wheel?" The reason 
Seems to come naturally from the Gestalt idea that association is a form of 


Cognitive organization. The image provides a way of forming an integrated 
unit of the two elements to be associated.? 


rative encoding supports the 


RETRIEVAL OF ASSOCIATIONS 


We argue that in storin 
pattern. Under this assumption 
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An example is shown in Figure 2-7. The list to be learned is shown, along 
with a set of feature tests that would identify the associations as a result of 
testing features of the stimuli. The questions contained in the diamonds are 
tests carried out by the pattern recognizer, and identification occurs when the 
sequence of tests and results arrives at a terminal node. Since the system can 
be represented as a set of nodes denoting feature tests and patterns, con- 
nected by a set of links denoting the sequence of carrying out tests, the system 
has the form of a network. We will refer to systems like Figure 2-7 as re- 
trieval networks (or retrieval systems). 

We think that the selection and arrangement of features in a retrieval 
network are determined mainly by two factors. First, if the items all have 
different responses, the features included in the network must provide for 
discrimination among all the associations. The subject cannot give correct 
responses to all the items if two different stimuli are sorted to the same termi- 
nal node. (Of course, if different stimuli are paired with the same response, the 
subject can benefit from grouping the items. We will discuss the positive 
transfer that can occur from such grouping in Chapter 5.) Since discrimina- 


INITIAL 
GLOTTAL? 


INITIAL 
FRICATIVE? 


ieval network for a list of paired asso- 


Figure 2-7 Hypothetical retr 
ciates. 
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tion among patterns is a minimal requirement for successful age тет 
Feigenbaum (1963) referred to the structures built by EPAM as gone 
tion nets, and Hintzman's (1968) simplified version of EPAM, vetet 
earlier, was based solely on its capability for discriminating ара е "m 
(The general problem of discriminating among patterns is a омак y 
formal theory of pattern recognition; see, for example, Minsky & Pa 
1969. | 
нн the requirement of discriminability does not determine e 
characteristics of a recognition network: many different нти ниц 
feature tests would provide complete discrimination among the items pore 
in Figure 2-7, or any list of associations. We propose, therefore, asa 559 sell 
factor, that the arrangement of the network must also be influenced by Я duse 
for efficiency of retrieval. Different arrangements of feature tests can pie m 
simpler or more complex networks. A simpler network has two деј пе 
first, it involves less information that has to be held in memory, so it s 19 ad 
be easier to retain: second, use of a simpler network requires fewer steps, 21 
therefore retrieval can take place more quickly and easily. 


One way to arrange a network for efficient retrieval is to t 


ake advantage of 
features that are sh 


ared among subsets of stimuli. Suppose one simum 
feature is shared by half the items in a list. Testing for that feature an 
enables the system to eliminate half the items. More generally, a simp d 
network will result if stimulus features shared by subsets of items are ма 
early in the recognition Process. This corresponds to having these — 
features located in relatively high-level positions in a graph such as Figu! 
2-7. 


Although the features needed to retrieve B 
stimuli, we expect that Properties of responses will also influence the xn 
tion of features in the network and their arrangement. A major reason - 
this opinion is our view that individual associations are generally represente" 
in a way th ation between the stimulus and response Duns, 

and later Gestalt theorists, we have conclude 
à representation of the stimulus-response pu 
» after which features of the stimulus are ig 
п. But this means that the features most likely 10 


u emnes f 
associations must be propertie 


at involves a rel 
Concurring with Selz (1913) 
that in learning associations, 
is generally stored in memory 
porated into a retrieval System 
be selected are those that are incorporated in the relational representation s: 
the stimulus-response pair. It is also likely that relations among response 
can be utilized in developing an efficiently organized system of retrieval. А! 
illustration is included in Figure 2-7, where the items sorted under "initi 
glottal" have the responses | and 2. Relationships 
cannot make the retrieval network 
that if items have responses th 
notice relationships 


ch 
qs suc 
among responses as St ly 
more efficient. However, it seems likel 
at are interrelated, s 


: E ably 
ubjects will more probab) 
among their stimuli 


ғ ably 
» and therefore will more probably 
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incorporate those interstimulus relationships into the retrieval networks they 
develop for the list. 

The hypotheses we have given here about retrieval systems imply that orga- 
nizational factors are important in associative learning, as they have been 
shown to be in free-recall learning (Mandler, 1967; Tulving, 1962; Wood, 
1972). With a few exceptions, theories of learning associations have neglected 
organizational processes, although Battig (1968) recognized that subjects do 
acquire groupings of items as they learn the list. Our view that an organized 
retrieval system is developed by subjects plays an important role in our under- 
standing of phenomena of negative transfer and forgetting, which we discuss 
in Chapters 6, 7, and 8. However, if we consider only the process of acquisi- 
tion, there seem to be fairly strong theoretical reasons for expecting organiza- 
tional factors to play a significant role in associative learning. 


SUMMARY 


verview of the theory of association devel- 
oped from studies reported in later chapters. The theory consists of two major 
claims: First, storing an association in memory consists of forming a new 
cognitive unit that includes the stimulus and response terms as components, 
and will generally occur through the finding of a relational connection that 
links the terms. Second, retrieval of associations occurs through a process of 
Pattern recognition based on an organized system of tests on stimulus fea- 


tures, 


The first claim, that a 5 
than connections between otherwise 1 


This chapter has presented an o 


ssociations correspond to relational structures rather 
ndependent entities, violates the funda- 


mental claim of associationist theory—that associations form through the 
coincidence of ideas. To form a relational structure, the learner must be able 
to recognize the relation used to form the structure. In assuming that learning 
Of associations is mainly a process of identifying relations rather than forming 
COnnections, we think our theory agrees closely with the claims made by 
Gestalt theorists and with recent theories about the organization of knowl- 


edge in memory, in which concepts are stored in complex relational struc- 


tures, 
The second claim, that retrieval is based on a system of tests of features, 


departs from both classical associationist theory and Gestalt theory. The idea 
that subjects acquire representations involving a rich set of interitem relation- 
ships seems consistent with the spirit of Gestalt theory; we can hardly imagine 
Kóhler or Koffka arguing against the idea of a relational representation of an 
entire list, However, the concepts in this theory of retrieval have not come 
from analyses given by Gestalt theorists, but rather from the modern theory 


Of pattern recognition. 
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Because of the strong continuity between our theory and the Gestalt tradi- 
tion, we would consider it quite appropriate if our theory were to be called a 
Gestalt theory of association. On the other hand, Gestalt concepts are com- 
bined with concepts developed within a theoretical framework of information 
processing, with roots in computer science. A more accurate label, historically 
then, would be a cognitive theory of association. 


In traditional associationist theory, especially in its behaviorist forms, the 
Outcome of learning has been conceptualized as a change in a person’s ten- 
dency to do something. Association has been considered a conditional rela- 
tion between stimulus and response (Martin, 1972; Postman, 1972). In this 
Way of thinking, when a person is learning an association between Aand B, 
the strength of connection between A and Bis increased, and this corresponds 
to an increasing tendency to perform B whenever the person perceives the 
Stimulus 4, 

Contemporary theories of memorizing take a different point of view. In 
Tecent analyses, memorizing has been conceptualized as a process of storing 
Information; remembering, as à process of retrieving stored information. A 
пећ set of concepts and hypotheses was developed during the 1960s by 
experimental psychologists who analyzed performance in à variety of memo- 
"izing tasks and who formulated their conclusions in terms of hypothetical 
models about processes of storing information in memory. 

, In Chapter 2, we sketched our main hypotheses about the process of memo- 
tizing associations. In our view, the main questions to be considered about 
associative learning concern the nature of information that is stored in mem- 
Огу when a person has acquired new associations between ideas. Thus, our 
theory of associative learning belongs in the recent information-processing 
tradition rather than in the behaviorist tradition of analyzing stimulus- 


re i i 
Sponse contingencies. 


In this chapter, we will review some major contributions to the theory of 
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memory storage based on concepts of information processing. Many of the 
empirical studies that have been used have involved memoriz ng individual 
items and lists of items, rather than memorizing associations. In this respect, 
the present chapter is a digression from the development of our main analysis, 
which focuses on the learning of associations. But theories of recognition 
memory and recall are important to the conceptual framework in which we 
have developed our hypotheses about associative learning, and thus the pres- 
ent chapter does provide relevant substantive background for the remainder 
of the book. 

Another reason for presenting the material in this chapter is technical. We 
give special emphasis to models formulated as Markov chains, and we describe 
Statistical methods for Markov models in some detail. Analyses presented 
throughout the book depend on measurements of the difficulty of various 
aspects of learning, and these measurements will consist of estimates obtained 
from Markov models of the kind we present here. This chapter presents the 
basic statistical methodology that is later used in testing substantive hypothe- 


ses about the process of learning associations, and about transfer between 
different associative learning tasks. 


ALL-OR-NONE LEARNING 


We begin with the simplest case, in which learning of each item involves 4 
single transition between states. Then learning is an all-or-none event, and 8 
subject's state of knowledge about ап item is that either it is unknown or it 
has been learned. The graph in Figure 3-1 shows the two states. 

Empirical results agreeing with the all-or-none model have been obtained 
in two kinds of memorizing experiments: recognition memory and simple 
associative learning with a small number of response alternatives. In ап 
experiment on recognition memory, a list of several items is presented, onc 
пет at a time. Then a test is given, in which each item in the original list 15 
presented again, but this time intermixed with new items. The subject's task 
is to indicate whether or not each item was in the original set. Again the 
original list is presented alone, then recognition is tested a second time by 
intermixing a new set of items with those being learned. By continuing 115 


Way, the items to be learned are shown repeatedly until the subject correctly 
recognizes all of them. 


In simple paired-associate memorizing, a similar procedure is used. A set 


Figure 3-1 Graphical representation of all-or-none learning. 
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of pairs is constructed—for example, stimuli of simple drawings paired with 
numerical responses. The pairs are studied, then in a test each stimulus is 
Shown and the subject is asked to give the paired response. The pairs are then 
shown again, followed by tests on the stimuli. This procedure is repeated until 
the subject can give the correct responses for all items. 

: In experiments of these kinds, the items are usually unknown at the begin- 
ning. At this point each item is in State U. The all-or-none hypothesis is that 
each time an item is presented, there is a fixed probability c that the item will 


become learned—that is, with probability c the item transits to State L. If 
ains in State U, and will have the same 


learning does not occur, the item rem 
As long as the item is 


Probability of becoming learned on the next trial. 
unlearned, the probability of a correct response on à test is a constant g, 
Presumably the probability of guessing correctly. It is assumed that the prob- 
ability of correct response in the learned state is 1.0, but since subjects are 
likely to occasionally fail to give responses they know, a criterion of about 
five Successive correct responses on an item is used in analyzing the data to 
Indicate the learned state; errors that occur after the criterion has been 


reached are ignored. 


When the model is correct, we can use it to measure the difficulty of learn- 


ing. In the simple all-or-none model, the difficulty of learning is indicated by 
the value of c; when c is large, learning is easy, and when c is small, learning 


'S more difficult. For example, Kintsch and Morris (1965) found that the all- 
9r-none model agreed with data of recognition memorizing in two conditions: 
as learned, and one in which a 


One in which a list of 10 nonsense syllables w 
list of 15 nonsense syllables was learned. For the 10-item list, the estimate of 
€ Was .37; the interpretation is that each time an item Was studied there was 
Probability .37 that information about the item would be stored in memory 
in à way that would permit the item to be recognized throughout the experi- 
Ment. With the longer I5-item list, the memorizing task was somewhat harder: 
the estimate of ¢ in that condition was .28. In later chapters, we will be con- 
Sidering more complex models and using the values of parameters like c to 
estimate the difficulty of different stages of the learning process. 

If the assumptions of a model are not at least approximately correct, the 
Parameter values that could be estimated would have no meaning. Therefore, 
there are two important practical questions about the statistical methods for 

arkoy models. First, how do we obtain the estimates of parameters such as 
©? Second, how do we decide whether a model is applicable for the analysis 


9f a set of data? 
“ae questions are an 
of ses the all-or-none model, we can der 
of th tous aspects of data. These probabi i 
fo © model. Then using empirical observ 
Tmulas to estimate the parameters of the п 


swered by the same general approach. From a model 
5 s rive formulas that show probabilities 


lities are functions of the parameters 
ations, we can use some of these 
nodel: if the model is correct, 
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these estimates provide measurements of the difficulty of learning. Having 
obtained the numerical estimates, we then test whether the data agree with 
predictions that are calculated from other formulas. Usually, estimates are 
obtained from some general summary properties of data, such as the mean 
number of errors and the mean number of trials before criterion. | 

Predictions to test the model usually involve more detailed properties of 
data. For example, a formula can be derived for the probability distribution 
of the number of errors. The numerical estimates obtained from summary 
statistics give a predicted frequency distribution of the number of errors per 
item. This predicted frequency distribution is compared with the frequencies 
in the data, and if the two agree to an acceptable extent, the model is judged 
applicable for those data. 

The analytic power gained by representing a learning process as a Markov 
chain results from the possibility of calculating the probability of any s€- 
quence of events that can occur in an experiment. This can be illustrated 
easily in relation to the all-or-none model. When an item is tested, there are 
three possibilities: an item may be learned, denoted by L; it may be unlearned 
but guessed correctly, denoted by G; or it may be unlearned and an error 
given, denoted by Е. One possible sequence in such an experiment is 


GEE GE CLL Lauros 


This sequence has probability 


g- — oXI — g)-(1 — o0 — g)-(1 — c)g-(1 — c)(1 — g) (1 — gee 
= (1 — o*1 — gyigic. 


The events that can be observed in data are correct and incorrect responses 
Let 1 denote an error and let 0 denote a 


correct response. One possible s€- 
quence of data is 


0110100000... 


Since correct responses due to guessing cannot be distinguished from correct 
Tesponses that occur because learning has occurred, the probability of а 
Tesponse sequence corresponds to probabilities of several theoretical S€- 
quences. The response sequence listed above has probability 


g-(l — ои — g)-(1 — oy — 8*(1 — с)в-(1 — ey1— g) 
[c + (1 — c)gc + (1 — cyge4-..] 


> 


where the sum of terms inside the brac 
immediately after the last error (c), 
then learning ((1 — c)gc), 


kets involves probabilities of learning 
or failing to learn but guessing correctly, 
or failing to learn twice and guessing both times, 
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then learning ((1 — c)?g?c), and so on. The infinite series inside the brackets 
can be simplified, giving the result 


Р(01101000...) = (1 — с)*8°(1 e»( с ) 

1— 2 + ge 
The probability of any kind of response sequence can be computed as a 
function of the parameters. In principle, then, the assumptions of the model 
could be tested by computing the theoretical probability of each possible 
kind of response sequence and comparing the results with the proportions 
obtained in the experiment. In practice, experiments are far too small to 
permit meaningful comparisons at this level of detail. Therefore, certain 
summary statistics are used in evaluating the model. Two that are often used 
are the trial of last error on an item, denoted L, and the total number of errors 


per item, T. These have distributions 


—_ ЕР k=0, 
= gpt ge Gt) 
2 — 0с (р oet b 

Ler 86 

LG ges, 

| — g++ 82 

P(T = ј) = 7 (3-2) 
BAAN age g-m- jeu. 

П в + 2с) 1—8 + 8с 


P(L = к) = 


г summary statistics were given by 
me attention to general methods of 
and Crothers (1965), in Laming 


Derivations of these and several othe 
Bower (1961), Similar derivations with so 
calculation can be found in Atkinson, Bower, 
(1973), and in Restle and Greeno (1970). 

ical model has two uses. It 


As we have mentioned earlier, a mathemati 


Provides a structural description of the process being studied, and as such it is 
is a basis of measurement; 1n a 


à hypothesis that must be evaluated. It also f me n 
hose values indicate difficulty of 


learning model, the model has parameters W р 
learning. Both uses require estimation of the values of parameters of the 
ion are possible; some discussion will 


model. Various procedures for estimat possibl : 
be given in Chapter 4. For the present discussion, it 15 sufficient to note that 
numerical values are required and can be obtained. Numerical values are 
Tequired if equations like those given above are to be compared with data. 


Empirical proportions of sequences with zer r T 
Observed, These can be compared with theoretical proportions calculated 


from Equation 3-2, but numerical values must be substituted for the param- 
eters g and c. One reasonably good method of estimation uses the observed 
Mean of one or more statistics in the data, such as the number of errors per 


о, one, 10, . . . errors can ђе 
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item. From Equation 3-2 it can be shown that 


E(T) (3-3) 
and if the value of g is known, an easy estimate of c is obtained: 
das d ms (3-4) 


ET) 


If g is not known, then additional data must be used in estimation. The mean 
trial of last error, derived from Equation 3-1, is 


l— g 55 
E(L : (3-5) 
(0) с(1— 2 4 gc) 
Equations 3-3 and 3-5 can be solved simult 
and c. 
To illustrate these methods, consider 
memorizing by Polson, Restle 


aneously to obtain estimates of g 


an experiment in paired-associate 
‚апа Polson (1965). The stimuli used are shown 
in Figure 3-2. Note that eight of the stimuli are distinctive; the other eight 
form four pairs of highly similar drawings that Polson, et al., called twinned 
stimuli. The responses were five short words, 


cost, hope, part, rush, and only; 
that the subjects memorized before working 


on the pairs. Responses were 


| 
x | "m 
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M <Q > | a 
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Figure 3-2 Sti 


muli used in paired associates by Polson, Restle, and 
Polson (1965). 
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assigned randomly to the 16 stimuli, with the restriction that the two stimuli 
of each pair had different responses. 

The experiment of Polson, et al., was based on Restle’s (1964) idea that 
paired-associate memorizing would become a more complicated process when 
similarity between stimuli substantially increases the difficulty in discrimi- 
native learning. The items with distinctive stimuli might be learned in an 
all-or-none fashion. But then it would be expected that the learning of items 
with twinned stimuli should involve another stage. in which the finer stimulus 
discriminations would be acquired. In other words, learning a distinctive pair 
would be a relatively simple process of storing a memorable representation, 
and it would not be surprising if this were an all-or-none process. But in 
learning an item with a twinned stimulus, subjects might often store a repre- 
sentation that did not include the features needed to distinguish between the 
two twinned items. In that case, the first representation in memory would 


have to be changed by adding additional features to avoid confusions be- 
tween the twinned items. 
the mean number of errors was 2.80, and the 


For the distinctive items, | а 
mean trial of the last error before criterion was 3.61. Using Equations 3-3 and 
25, 6 = .30. When these values 


3-5, the estimated parameter values are С = .2 = A а 
are substituted in Equation 3-2, the function drawn in Figure 3-3 is obtained. 
The empirical distribution of the number of errors per пет is shown in 
Figure 3-3 by solid dots. The empirical and theoretical distributions of the 
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errors for individual paired-ass© 


36 Processes of Storing Information in Memory 


trial of last error are shown in Figure 3-4. Both comparisons show that the 
data and the theoretical predictions were in good agreement for these items. 
(In the present discussion evaluation of goodness of fit, like estimation of 
parameters, is informal. More rigorous methods will be used in later discus- 
sions and will be defined as they are introduced.) 

It was not expected that the data from the twinned items would agree with 
the all-or-none model. However, it is important to understand that testing a 
model involves a kind of sorting procedure—some sets of data agree with a 
model, others do not. The twinned items had an average of 5.96 errors. If we 
assume that g was .30, as estimated for the distinctive items, then Equation 
3-4 gives the estimate ĉ = .117 for the twinned items. This gives the function 
Shown in Figure 3-5. The data from the twinned items shown in Figure 3-5 
clearly disagree with the predicted frequencies, confirming the expectation 
that learning the twinned items would be a more complicated process—not 
just a slower one. Since the data show the all-or-none model is not a correct 
description of the learning of these items, it would be a mistake to attribute 
any meaning to the value of the learning parameter estimated from these data. 

The purpose of comparing data and theoretical predictions like those in 
Figures 3-3, 3-4, and 3-5 is to provide a test of assumptions in the model 
being considered. Note that this test uses considerably more information 
from the data than is often the case. In most psychological experiments the 


mean score of some statistic like total errors is used to measure performance, 
265 
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Figure 3-4 Theoretical апа empirical distributions of trial of last 
error for individual Paired-associate items, from Polson et a/. (1965). 
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et al. (1965). 


and analysis of variance is carried out to judge the importance of effects pro- 
duced by experimental variables. In the analysis of variance, variation among 
individuals in a single condition, or among measurements taken on the same 
individual, is considered noise, often called error variance. In that framework 
the amount of such variation is not informative about theoretical questions 
and is generally taken as a quantity that should be made as small as possible. 
In contrast to the analysis of variance, analysis based on a specific quantita- 
tive model uses the entire distribution of a variable, such as number of errors 
ог trial of last error, as empirical information to be used in making theoretical 


inference, 
epresents only one way of 


The statistical methodology described here r | 
d in the а -ог-попе model. Regarding 


testing hypotheses of the kind represente 
the hypothesis of all-or-none learning of simple associations, Estes (1960) 
test, the probability of a 


gave evidence that if an item was missed on one | 
Correct response on a second test was equal to chance guessing. This supports 


the idea that items have two states of performance that we have called 


unlearned and learned. Rock (1957) compared a condition where a list was 
dition where any item that was missed 


learned in the normal way with a condition Wy iss 
9n a test was replaced by a new item. Subjects in the replacement condition 
achieved a criterion of no errors as quickly as subjects in the control condi- 
tion, supporting the idea that as long as ап item was not learned, its prob- 
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ability of being learned remained а constant even though it had several 
previous study trials. There is considerable literature on w hether simple asso- 
ciative learning is all-or-none (e.g., Postman, 1963b: Restle, 1965; Under- 
wood, Rehula, & Keppel, 1962). Clearly, it often is not. But we conclude 
that the Weight of evidence favors the idea that learning Proceeds through a 
series of discrete changes, rather than gradual strengthening of traces or con- 
nections (cf. Crowder, 1976, chap. 9), and that the all-or-none model pro- 
vides a good approximation of the course of learning in the simplest cases, 
in which only one discrete transition is required for a learning criterion to be 
achieved. 


ANALYSIS OF SHORT-TERM RETENTION 


The all-or-none model gives a useful statistical description of simple learn- 
ing, based on transitions Occurring between successive trials. However, the 
model says virtually nothing about the mechanisms of information processing 
that result in storage of information. A great deal is now known about pro- 
cessing that occurs when items are studied, A composite of most theories is 
Summarized in Figure 3-6. 

Information enters the System through the various perceptual systems, 
Which have brief holding capacities in the form of short-term Sensory storage 
(STSS). Attentional mechanisms select information for further processing in 
short-term memory (STM), which has a Capacity of approximately five to 
seven chunks of information, and when information is arriving at fairly high 
rates, an individual item typically is maintained in short-term memory for a 


few seconds. When a subject memorizes a list of items, a representation of 


general structure of semantic and 


factual knowledge and thus become part of the person's permanent store of 


оде 
== 
zs 


SEMANTIC AND 
FACTUAL KNOWLEDGE 


Figure 3.6 A model of information-processing systems. 
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Markov Chain with Short-Term Memory 

During study of a list of items, subjects sometimes know the correct 
answer on a test before the item has been learned because the tested item was 
studied recently and is still in short-term memory. A simple change in the 
all-or-none model provides a system that can be used to analyze some aspects 
of short-term memory. 

A fairly general version of the appropriate Markov model is shown in 
Figure 3-7. Two changes from the all-or-none model are important. First, 


there are three states rather than two: an item that is in short-term memory 


is in State S. Note that State 5 in the Markov model is not the same as the 
short-term memory system indicated in Figure 3-6. The Markov model de- 
scribes what happens to a single item during an experiment. The item may be 
in State S— this occurs on a trial when the item is part of the contents of short- 
term memory (STM). State L in Figure 3-7 applies after the item has been 
learned. In most experiments, an item is counted as learned if the subject 
gives its response correctly for a few tests. It probably is not necessary for an 
item to be integrated into the subject's permanent store of semantic and 
factual knowledge to be counted as learned in most experiments. Thus, entry 
into State L probably corresponds to storage in intermediate-term memory 
(ITM) in a way that permits reliable retrieval. State U is the state of an item 
that is not stored in the subjects memory when it is presented. 

Second, the transitions assumed when an item is presented correspond to 
assuming that all items are processed through short-term sensory storage at 
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least to the level of short-term memory. If an item is in State U when it is 
presented, there is probability c, that it will be learned; otherwise, the item 
goes into short-term memory. If the item is already in State S when it is 
presented, it is learned with probability c,; otherwise, it remains in short- 
term memory. Between an item’s presentations, it can make a transition from 
State S to State L; this has probability d, and a probable mechanism is 
rehearsal of the item during presentations of other items. If the item does not 
80 into State L, it will remain in State S with probability 1-7; otherwise it is 
forgotten, which corresponds to going into State U. 

The model shown in Figure 3-7 has been used in analyses of interactions 
between short-term retention and transfer of information to long-term 
memory. Bjork (1966), Greeno (1967), and Kintsch (1966) used the model to 
study effects of spacing between presentation of an item. When two presenta- 
tions of an item are given with few or no other items between them, perfor- 
mance on a later test is less good than if presentations of an item are 
separated by presentation or test of several other items. 

This effect of spacing can be explained in two general ways. One way notes 
that with successive or nearly successive repetitions, the item will probably 
be in State S when the second presentation occurs, and that c, might be 
smaller than c, in the model. This could occur for either of two reasons. The 
first possible assumption is that if an item is in State S when it is presented, 
the subject does not try as hard to process that item for long-term retention. 
This could be due to a simple strategy of conserving information-processing 
capacity; that is, the subject relaxes a bit when the item that is presented is 


tween the item’s Presentations, inadequacies in the encoding of the item would 
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ments: that is, some interpresentation rehearsal occurs, but it is reduced when 
interpresentation intervals are short; at the same time, when an item is shown 
that is already in short-term memory, subjects tend to reduce their processing 
or maintain the encoding they already have achieved. (A thorough review of 
this issue has been given recently by Hintzman, 1974.) 

Additional features of the interaction between short-term retention and 
longer-term memorizing have been inferred when temporal variables have 
been analyzed in greater detail. Rumelhart (1967) showed that an interaction 


between the interval separating two presentations and the interval separating 


presentations and test can be explained by assuming that the subject does not 


always attend to a presented item. In the model this would be represented as 
a nonzero probability of remaining in State U during a presentation. Young 
(1971) showed that further facts about spacing, including an eventual reversal 
of the spacing effect (very long interpresentation intervals are worse than 
moderately long ones) and some time-dependent effects of test trials, can be 
explained by assuming that two levels of short-term memory exist. Young s 
theory includes an immediate short-term memory state from which items can 
always be retrieved, and a more remote short-term memory state from which 
failures of retrieval can occur, but from which learning occurs with nonzero 
probability on presentations and tests. Izawa (1971) has given an alternative 
explanation to Young's hypothesis of two levels of short-term memory. 
Izawa's idea is that a test can have a potentiating effect on later study trials, 
thereby causing improvement in performance by indirect means, rather than 
by learning that occurs during the test. Кя — 

An important application of the model shown in Figure 3-7 was deve E 
by Atkinson (1972). When students work on à memorizing task suc E 
foreign-language vocabulary, they should benefit from selecting items for 
study in some optimal way. For example, if c, 18 substantially larger es са 
then putting long intervals between successive presentations of a single item 
will be mo ec | 

гонту РЕ to select items for study ina vocabulary-learning 
task. Students worked on a set of 84 German vocabulary items 1n daily ses- 
sions, The items were divided into seven sets of 12 items each, and each trial 
of the experiment involved one set, with a single item selected from the € 
for study. In different conditions different methods were used to qe e 
item to be studied. As a baseline, there was à condition involving € 
selection. In a second condition, the subject selected the item to be studied on 
each trial. . 

In the remaining two conditions, the model shown in Figure 3-7 was used 
to select items. On each trial, a calculation was made for each item in the set 
for that trial to determine the probability that if the item was presented, it 
Would cause a transition into State L. The calculation used all the responses 
the subject had made on that item on previous trials. The item with the highest 
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long-term memory is conducted, and the probability of retrieval depends on 
the amount of information stored in long-term memory while the item was in 
the buffer. Thus, the probability of correct response depends on the time an 
item resided in the buffer; hence, on variables like those described above. The 
quantitative effects of these variables can be calculated in the form of theore- 
tical predictions when parameters of the model are estimated. Parameters 
include the size of the buffer, rate of transfer of information to long-term 
memory, the rate at W hich information is lost from long-term memory, and 
the number of times information is taken from long-term memory in the search 
for the item. The model has given good explanations of an impressive variety 
of experimental findings, and interesting variations and elaborations of the 
model have been developed for special experimental conditions. \ 

A theory dealing with many of the same empirical phenomena as Atkinson 
and Shiffrin's, but based on à different idea, was given by Bernbach (1969). 
Rather than assuming that short-term and long-term memory constitute dis- 
continuously different systems, Bernbach postulated a single memory system. 
The main assumptions of Bernbach's analysis involve a process of rehearsal 
that has the effect of creating replicas of studied items in memory. When a 
Subject studies an item there is specific rehearsal that results in storage of 
some replicas of the presented item. Following specific две of the pre- 
sented item, a general rehearsal process 15 carried out, involving all items 
e replicas in memory. The number of replicas created in 
each process is a random variable. During specific rehearsal k, replicas of the 

А c Poisson distribution with parameter 
presented item are made, and k, has the kg deed eite 
Д, а total of К, replicas are made during general rehearsa’, a в паз the 
Poisson distribution with parameter A, Each time a replica is created in 
general rehearsal, every item having one or more replicas in memory is equally 
likely to be the item getting the added replica. Replicas are lost when a new 
item is studied. Each time specific rehearsal occurs, every 1tem with replicas 


in memory loses a single replica with probability ô. | an 
It is harder to calculate theoretical predictions for many experiments using 
Bernbach’s model than it is using Atkinson and Shiffrin’s. Computations 
have been carried out for a number of experiments, and results show that a 
Single-system theory of the kind Bernbach developed has many of the same 
implications for experimental results as the theory i separate short-term 
and long-term memory systems developed by Atkinson and Shiffrin. The 
ablished in Bernbach's work is that transient 


important theoretical point està E p i 
phenomena of the kind studied intensively in memorizing experiments do not 
a 


necessarily require à theory having à discontinuous distinction between long- 
term and short-term memory systems. On the other hand, the theory with 
separate systems seems to be more tractable, and because the two theories 
have similar empirical content, It 15 reasonable to use the theory with sepa- 


rate systems in analyzing effects of empirical variables that affect the ease of 


having one or mor 
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ory trace, incorporating the concept in an азајузћ of relationships between 
perception, short-term memory, and long-teri. memory. 

А Norman and Rumelhart’s theory incorpor..tes hypotheses about percep- 
tion given by Rumelhart (1970), where it is assumed that features of a stimu- 
lus are registered, and recognition of a stimulus occurs if a sufficient number 
of features is extracted. Norman and Rumelhart postulated a naming 
mechanism that includes a dictionary, containing names and vectors of fea- 
tures for the set of items that the subject is expecting. A name is assigned to 
the stimulus if all the features extracted in perception match elements in 
only one of the feature-vectors in the dictionary. (An alternative assumption 
is also described, in which a subject guesses between alternative items if more 
than one dictionary entry matches the perceived features. However, it is 
assumed that the more conservative strategy of assigning a name only if all 
the perceived features match is probably used in most situations.) When a 
name is assigned, the named item’s vector of features is registered in short- 
term memory. 

The attributes in short-term memory are assumed to decay as they are 
held, and Norman and Rumelhart obtained evidence that loss of attributes’ 
clarity in short-term memory depends mainly on the amount of time since 
the item was registered rather than on the number of items presented during 
its residence. While the item's features remain in short-term memory, infor- 
mation is in the form of an association between an attribute and the context 
in which the attribute was perceived. At any specific time, the attention given 
to each item in short-term memory is proportional to the clarity of that 
item’s attributes relative to all the items then in short-term memory. And the 
Probability of transferring an attribute-context association to long-term 
memory depends on the amount of attention the subject 1s giving to the item. 
The theory thus describes a system in which a recognized item is represented 
in short-term memory as a vector of its attributes, and association between 
some of the attributes and the context in which they were perceived is trans- 
ferred to long-term memory. A more complete representation, involving 
More of the item’s attributes, will result if fewer other items are in short-term 
memory, and a more complete representation will be favorable to perfor- 
mance on retention tests. а | i 

If a subject is shown an item and asked whether it was in the list studied, 
the features of the test item are compared with attributes stored in long-term 
memory to see whether an item matching the present one was included in the 
list. If a criterion number of the test item’s features are found associated with 
a context from the list, the subject responds that the item is recognized. The 
theory provides а way of explaining performance with different criteria of 
recognition (a stricter criterion corresponds to requiring a larger number of 
found features) and the effect of presenting test items that are more or less 
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(1968). Laughery postulated a single short-term memory system in which 
input items are stored and held, if possible, until needed for output on a test. 
A rehearsal mechanism was postulated, and a process of updating the con- 
tents of short-term memory by rehearsal preserved information in the system. 
In Laughery's theory, long-term memory holds the dictionary of items known 
by the subject. Long-term memory is not a separate storage system used to 
hold the items presented during a specific memorizing task. 

Laughery’s program simulates performance in memorizing tasks where 
items are presented either visually or auditorily. With visual presentation, the 
items’ visual features are given as input: with auditory presentation, the 
inputs are vectors of phonemic features. Inputs are placed in a "window"— 
actually, a set of computer memory locations that the simulation program 
interrogates from time to time to receive new items. Laughery assumed that 
items are represented in short-term memory by vectors of phonemic features. 
When inputs are received by auditory presentation, phonemic features are 
transferred directly from the “window” to short-term memory in the form of 
а memory structure containing the name of the structure and phonemic fea- 
tures as substructures. Each component substructure includes a time tag and 
а decay parameter. The decay parameter used at initial storage of a com- 
Ponent is a characteristic of that component. If visual presentation is used, 
input visual features are sorted through long-term memory. This should 
result in finding the input item, a process of recognition. When the item is 
recognized a short-term memory structure based on its phonemic features is 
Created. 

Each time a new memory structure is created, the structure created just 
Previously is given a substructure that links it to the new item. This linking 
information consists of the address in memory W here the newly created item 
can be found. Links are themselves components of memory structures, and as 
Such they have time tags and decay parameters. They provide the mechanism 
for determining the order in which information is stored and retained. 

Laughery's theory is like Bower's (1967b) multicomponent theory of the 
memory trace in assuming that individual components of the memory struc- 
lure are lost over time. Each component has a characteristic decay param- 
eter B, and is assumed to decay in such a way that the probability of 
retrievalattimetisp — € Pf being the time since the component was stored 
Or updated by rehearsal. and e being a standard constant. Rehearsal occurs 
whenever there is time for it. When rehearsal is carried out, items in short- 
term memory are retrieved in order, the order being determined by the linking 
components of the memory structure. Rehearsal of an item consists of retriev- 
ing the components of the item, and then updating the time tag and reducing 
the decay parameter for the components of the item found in long-term 
memory. Rehearsal thus resets the time clock associated with the components 
of an item and also produces a reduced rate of decay for the components. 
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Figure 3-8 Graphical representation of two-stage learning. 
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The second stage of learning is accomplished when the item goes into 
State L. This may occur on the same trial as the transition from State U; 
With probability b the second stage takes no trials beyond those needed for 
Stage 1. But with probability | — b an item that has left State U goes into an 
intermediate State /, and some trials are needed before the second stage of 
learning is accomplished. The values of b and c give measures of the difficulty 
Of the second stage of learning. Large values of b and c correspond to short 
residence in State / and therefore indicate that the second stage of learning is 
easy. Small values of b and c indicate that the second stage is difficult since 
the system will be in State / for many trials. 

In the first section of this chapter we described statistical methods that can 
be used to test whether learning in some situation is all-or-none. Those same 
methods can be used to test a two-stage model, although the formulas invol- 
Ved are somewhat more complicated. We will discuss the statistical properties 
Of two-stage learning in somewhat more detail in Chapter 4, where we will 
begin to make substantive inferences based on measurements of difficulty of 
the two stages of learning. But to illustrate the general ideas involved, we will 
give a brief introduction here. о 

According to the Markov model, the two stages of learning are independent 
all-or-none events. Let 21 be the number of trials that are spent in State U, 
and let Z, be the number of trials spent in State /. In other words, Z, is the 
number of trials needed to complete the first stage of learning, and Z, is the 
number of trials needed to complete the second stage. The distributions of 
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Polson et al. proposed that the first stage involves storing a representation 
of the association, and the second stage involves refining that representation 
if necessary to avoid confusions between similar stimuli. But there are other 
interpretations that have been given to the two stages of memorizing, depend- 
ing on the nature of the task studied and the kind of process inferred to be 


Occurring in the learning situation. 

One alternative that has been used by Atkinson an 
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Another idea that leads to a two-stage process is that an item can begin 


With a wrong response associated with the stimulus. Then the first stage of 
learning consists of unlearning the wrong association: an item in State / has 
по stimulus-response connection and the subject will be able to guess the 
Correct response. Once the wrong association has been unlearned, the cor- 
Tect response will be learned in an all-or-none fashion. This idea was used by 
Bower and Theios (1964) in an application of the model to an experiment in 
Which responses were changed after subjects had learned them. State U was 
thus interpreted as a state in which the response learned first was still con- 
Nected to its stimulus, after the experimenter had begun reinforcing a different 
response for that stimulus. Bernbach (1965) used a similar idea, but assumed 
that learning of wrong responses could occur during an ordinary paired- 
associate experiment if the subject had to guess a response and retained the 
Wrong guess rather than the response declared correct by the experimenter. 
Along similar lines, Millward (1964) and Nahinsky (1967) considered a sys- 
tem in which partial learning consists of learning to avoid giving certain 

esponse for an item increases before the 


errors, The probability of correct г Д 
correct association is learned because the subject learns that some subset of 


response alternatives is incorrect. i à 
A hypothesis we will discuss thoroughly in Chapter 4 is that Stage 1 of 


Paired-associate learning is acquisition of the response while Stage 2 is 
learning the correct stimulus-response connection. This idea was developed 
detec dier ani Hanes LH RE. hy ва ene: SeN (1360); 
Kintsch (1963) used it in a quantitative analysis. ) 

Another hypothesis has been used by Estes and DaPolito (1967) and by 
Kintsch and Morris ( 1965). The first stage of learning is assumed to involve an 
accomplishment that allows the subject to recognize the item, but not neces- 
sarily to recall it. The transition to State L occurs when the subject becomes 
able to recall the item. Estes and DaPolito used the idea in connection with 
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. The question considered in this ch 
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ап important general problem that wi 1 > сопс 
tion between ideas. We suggest that the classical associationist account of 


this process presents difficulties, but that analyses of cognitive structure such 
as those discussed in Chapter 2 provide а promising basis for developing new 


approaches to the problem. 


The main task of this chapter involves analysis of the process of learning 


New associations between ideas as this is observed in laboratory experiments. 
It is generally agreed that the process of memorizing paired associates is not 
à simple, unitary process except under quite special circumstances. We review 
two hypothetical subprocesses—response learning and stimulus encoding— 
that have been incorporated into standard associationist analyses of paired- 
associate memorizing. We will argue that the cognitive theory sketched in 
Chapter 2 provides an alternative interpretation for the kinds of phenomena 
that led associationists to postulate these auxiliary processes. Finally, we 
include quantitative analyses of the process of paired-associate memorizing, 
Using the two-stage Markov model discussed at the end of Chapter 3. 


COMPLEX IDEAS 


The concept of association is important because it offers solutions for 
fundamental intellectual problems. One of the major questions this concept 
has been used to answer is, How do people come to have complex ideas? 
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to A Nm intricate pattern of forward, backward, and remote 
structure. ia Ms ren € when mental elements are combined in a complex 
Th our орка 6 с that nonsense syllables were chosen as the elements. 
ovina ee m ние a word like “dog” 15 considered to be relatively 
more complex oo and a nonsense syllable such as dwg” is assumed to be 
жице of d A ut a century ago, words were assumed to have meaning by 
believed ari reu with other words and ideas. Thus, a familiar word was 
ере plex because of its many associations with other ideas in cogni- 
shi | e, while an unfamiliar syllable was believed more simple because 

ction with ideas already in the mind. 


ack of associative conne 


General Critique 
basic idea of the associationist analysis seems 
as that are composites of other 
of combination by which 
eas is a central theoretical 


F 
mi complex concepts, the 
fee ee Certainly persons develop ide: 
Dr , and surely an understanding of the rules 
RR ideas are composed into complex id 
However qoe у 
Owever, the associationist analysis 


com x 
COSS Audi ideas of a complex concept 
“>binghaus’ analysis to be relevant to the problem of forming complex ideas, 


aoe be assumed that a complex idea can be composed of arbitrary com- 
erp. arranged in an arbitrary sequence. Contrast this with the kind of 
plex idea illustrated in Figure 2-3. That idea represents an event in which 
fa put a package on a table, and is analyzed in Normen and Rumelhart's 
leory as a combination of component ideas. However, the form of the com- 
oni is strongly restricted. Component ideas are related in specific ways, 
her than by undifferentiated connections. And the w hole idea is organized 

Y the constraints of a schema: the general idea of "putting" is known, and 
the Specific event involving Peter uses a pattern of relations that is known. 
The question arises, then, of whether the associationist solution to the 
Problem of complex ideas has any generality beyond the artificial composi- 
tions of unrelated words and nonsense syllables that were contrived by 
Investigators to minimize the meaningfulness of the complex idea to be 
acquired, We emphasize that We are not questioning the scientific practice 
of studying artificial tasks in order to test ideas about acquisition of complex 
Ideas in experiments on rote memorizing: the failure is not due to the artifici- 
ality of the situation studied. On the contrary, the failure, if it has been a 
failure, was brought about by use of a mistaken theory that wrongly charac- 
lerized the nature of complex ideas. If complex ideas are really complex 
relational structures. not simply connected networks, information about their 
dies of comprehension and learning of relations 
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and the more complex con- 
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bably be understood as an 
structure. 


These involve schemata of varying complexity, 
cepts have components that are present in the 
Process of developing these schemata can pro 
elaboration in which new components are added to previous 


ASSOCIATION AS STIMULUS-RESPONSE 
CONNECTION 


During the first half of the twentieth century, American psychologists who 
Studied learning became increasingly influenced by behaviorism, a method- 
Ological program developed in biology and imported to psychology by 
Watson (1919), among others. They turned their attention away from ques- 
tions about association between ideas in the mind and toward questions 
about association between stimuli and responses. A sharp distinction between 
Stimulus and response does not arise naturally when the concept of associa- 
tion is used to explain the emergence of complex ideas, but it does arise in 
the experimental situations that have been used to investigate processes of 
association. This is particularly true of paired-associate memorizing, and in 
Tecent years most of the experiments used to develop and test concepts of 
association theory have used the method of paired associates. — 

If one considers an association to be a connection between à stimulus and 
à response, then paired-associate memorizing represents a paradigm case for 
association theory, in that the process of forming an association can be 
Observed in a relatively simple and pure form. The investigator can specify 
What the stimulus is on each trial, to a greater degree than in other memoriz- 
ing tasks such as serial or free recall memorizing (see Underwood, 1964). 

urther, a single correct response for each stimulus is specified. It is not 


Surprising, then, that the process of learning to give the correct response for 


а single paired associate has been considered to be a kind of two-body 
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meaningfulness primarily affects the 
d by the fact that varying the mean- 
flects on difficulty of learning than 


than nonmeaningful ones. The idea that 
response learning component is supporte 
ingfulness of stimuli leads to far smaller е 
does varying the meaningfulness of responses (е.2., Cieutat, Stockwell & 
Noble, 1958). Apparently, meaningfulness has a more important influence 
than pronounceability on the difficulty of remembering items: a list of such 
items as AFL and TWA has been found easier for subjects to remember than 
а list of the same items with their letters rearranged to make them pronounce- 
able (FAL, TAW); however, the pronounceable list was easier than items 
that were neither pronounceable nor meaningful, for example, LFA and 
WTA (Gibson, Bishop, Schiff & Smith, 1964). 

Another finding that has been interpreted as support for the hypothesis of 
response learning involves transfer to paired-associate memorizing after free 
recall pretraining. If subjects are first trained to recall a list of items, and 
these items are subsequently used as responses for paired associates, then 
the associations are easier to memorize. This effect is especially strong with 
response terms low in meaningfulness, but it has 


also been shown to occur for 
highly meaningful (word) responses (Underwood, Runquist, & Schulz, 1959). 
An explanation in associationist theory is 


that during pretraining the subjects 
accomplish the response-learning phase of association learning: therefore, 
Only the connections between responses and stimuli remain to be learned 
during the paired-associate training. 
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ing and learning of stimulus-response connections. In the experiment, 
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learning but, because of à difficulty in discrimination, retards learning of 
Sümulus-response connections. 


"A in meaning, while in 
Subjects were given paired-assoc 
and asked to rec 


then were stopped ll o 
rms Were simil 


Critique of Associationist Interpretation 
ether response learning, as conceived in the 


nist theory, undermines the basic premise of asso- 
dge is built by forming connections between sub- 
Units, The importance of response learning might be taken as evidence that 
Even in memorizing associations there is a critical nonassociative process. 
Associationist theory Сап comfortably assimilate à process of response 
learning if an analysis of response learning shows that it is itself a process of 


An important question is wh 
Modification of associatio 


Ciationism that all knowle 


Я 
e spons 
-— ically possible. Re 
Orming as tions. Such ап analysis js theoretically cl Der Ween i 
integration Could be а Process Consisting of forming associa У ке рани 
elements Of the re: grating the components ofa puis E theoretic 
i i ed in learning a list, and a associative 
i 45 could be carried ош. An a 
explanation of the Process of formin sponses an 
i ing associations between the jae avail 
the background ore ti in the Situation, thus increasit Ы 
ability of the responses, 


g duce 
à eqs i act pro 
tegration ang availability are ie ов W 
nections: There are substantia ce of dou 
i Of response integration, One sour 


ist 0 
ing a list 
and Schulz’ (1960 Nding that the ease of learning of the 
trigrams Was more Strong] dependent Оп the Pronounceability writte” 
trigrams than on the frequency With Which the trigrams appear in 
English. О might 


tter 
sh. ¢ ave t Ought that fi 
combinations Would b i 


5 је 
Tequent €xperience with стијене 
© Optima] f, Toducing an advantage in a io pro 
among the Components. Instead, the result t having 
OTe suggests that integration is better aided by 
components that c, ily b, i 


ic relation 
e fit together Into à pattern of phonetic r 
The Plausib 


ing gives 
SA Sponse integration and list learning 81. 
ап additional doubt that res i 

connections between 


nin. 
Se integration is a process of forn ра“ 
Components, Recent analyses haye increasingly ee 
sized that Subjects Теа lists p nding relationships among the ite e 
a structure Of a fi Broups of items, rather than a пре 
Ubjects m ke Strong use of relationships that te 
al lists (Restle 1 4/7» 1970) and lists presen Id. 
ower, Clar inzenz, 1969; Bousfie 


le analogy between 


veen 
>, Stroie Correlations are found eon 
cal Organization and Success in recall (Mandler, 19 1 
Furthermore, measures о Consistene in Tecal] from trial to tria] are consi 
tent with the idea that mem rizing essential] amounts to finding a Way 
to organize it (Martin Teen, 19 Tu ving, | 

€ weight of evidence s ems огађје | i TP that lists and indi- 
vidual Tesponses are Integrated b le à at lists a 
contrary conclusi 


Ple associ 
On, see Postman, 1972, owever ere q t seem to be 
Strong objections to the idea of Со пе Хај associations in erimental 
findings seem to indicate а Strong ability p Ubjects to Cima о аи fet 
Occurred together in 4 list: the hypothesi. that jects st | i formation 
about list membership and other Context inr, rm appears те ai current 
theory we are familiar with (e.g., Anderson & Bower ied 


> 3; Kintsch, 1974; 


a 
à r 
© connections, (Fo 


Processes in the Learning of Associations 61 


Norman & Rumelhart, 1970). Of course, if it is accepted that responses are 


Organized into a structure, rather than connected by undifferentiated associa- 

tions, the form in which contextual information appears will be affected. It 

Seems more realistic to include context information as a part of the total 

Cognitive structure representing the list, rather than as a stimulus connected 

to each response item by separate links. However, the basic idea of a connec- 

2 from the context to the items in a list seems well motivated and reason- 
e 


Cognitive Interpretation of Response Learning 

The questionable aspect of the theory of response learning is the hypothesis 
that integration consists of forming connections among components. We now 
will present an interpretation of some phenomena that previously have been 
Considered evidence for the associationistic concept. We will present inter- 
Pretations of four facts: first, that pairs with meaningful responses are easier 
to learn; second, that meaningfulness of stimuli has little or no effect on 
difficulty of associative learning; third, that pretraining a subject to recall 
à response list facilitates learning of associations; and fourth, that similarity 
among stimuli causes an increase in response recall when associative learning 
15 interrupted, despite the greater difficulty it produces in learning the associa- 
lions, 

The first fact to be explained concer 


Familiar words are already represented i i 
Onsense syllables, which are not recognized as words, must be treated as 


combinations of letters; the subject has representations of the letters stored 
in long-term memory and can construct a syllable from a sequence of those 
Units. Syllables that are not words but remind the subject of words are 
8enerally syllables that share phonemic or other features with words that are 


familiar to the subject. 

In memorizing an associ 
the pair in memory. It is not $ 
Word response more easily than = 


ns the meaningfulness of responses. 
n the subject’s long-term memory. 


subject must store a representation of 
urprising that subjects can represent a familiar 
novel string of letters. An explanation for 


this is based on the idea that storing the pair requires processing in short-term 
Memory, This idea was discussed in detail in Chapter 3. Especially relevant 
I$ Laughery's (1969) and Norman and Rumelhart's (1970) idea that items 
Stored in short-term memory af entered through a naming device, which 
Uses names found in long-term memory. If a nonmeaningful syllable is to 
be stored, its representation must be a sequence of letters, and Murdock 
(1961) obtained results suggesting that a three-letter item makes the same 
demands on the syste™ as a sequence of three short, familiar words. The 
Telative advantage of nonsense syllables thait BEE moe meaningful, in that 

f words, is not explained in any current theory, although 


they remind subjects © А 
ample experimental evidence (e.g., Dallett, 1964; Lindley, 1963) shows that 
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to familiar words. 

An interesting hypothesis about so-c 
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€ to subjects for storing syllables that are related 


alled meaningfii nonsense syllables 
according to schemata based on their 
Similarity to familiar words. The proposal that schemata play an important 
role in memory is credited to Bartlett (1932), whose experimental support for 
the idea was convincing, albeit informal. Bartlett's best-known examples 
involve memory for stories: he observed that the general pattern of a story 
was remembered well, but details tended to be forgotten or distorted to fit 
the pattern. More recently, Posner and Keele (1968) have shown the effective- 
ness of relatively complex relational properties of dot patterns in establishing 
schemata and have demonstrated that subjects can correctly classify new 
patterns that are constructed by varying the original pattern. Norman and 
Bobrow (1976) have given a general discussion that emphasizes the impor 
tance of schemata that are stored in memory for processes of identifying 
Patterns. The encoding process that makes the more meaningful nonsense 
syllables easier to learn than less meaningful syllables probably has much in 
common with the Schema-based encoding and 
by Bartlett, by Norman and Bobrow, 
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contextual associations play a role in acquisition of paired-associate lists, and 
неа of the set of responses used could have a facilitating effect. 
sian anand when subjects study a list for free recall, they acquire an 
ieee fe structure that makes retrieval of the responses possible. When 
‘ten om this list appear later as responses in a list of paired associates, the 
ion b 5 must alter the original structure 50 that the paired-associate stimuli 
| e used as specific retrieval cues for their respective responses. Whether 
the availability of an organized set of responses should facilitate learning of 
sof mi sek ped arn eo 
IRAE Am associations and that needed 15) g onse list. 
om it is not inconsistent with our interpretation that pretraining on 

Tesponses can facilitate paired-associate memorizing, it would not be 
Surprising if the opposite effect were to be found in some cases. 

The fourth fact to be explained is Underwood, Runquist, and Schulz’ 
(1959) finding: Although learning of associations is made slower when 
response terms are similar to each other, if subjects are stopped after a num- 
ber of trials and asked to recall responses, they will give more of the words. 
The interpretation in associationist theory is that response learning is facili- 
tated by similarity of responses, but association learning is retarded by 
difficulty in discrimination. An alternative hypothesis is that, in paired-asso- 
Slate training, subjects store representations of the stimulus-response pairs, 
rather than representations of responses, stimuli, and connections. If a 
subject has been working on memorizing associations for a while, the request 
(0 recall responses calls for retrieval of part of each stored record—a retrieval 


task for which the representations in memory are not ideally suited. However, 
more information linking responses with other 


and this feature could be responsible for 
the advantage in response recall produced by similarity. The greater difficulty 
of associative learning with similar responses may be due to greater difficulty 
Of discrimination, as supposed in associationist theory. But the greater ease 
of recalling similar responses may not indicate an advantage in response 
learning as much as an advantage in retrieving the stored information about 


responses. 


w " " 
d hen responses are similar, 
€sponses is available in memory» 


STIMULUS ENCODING 

ttention to the need for analysis of stimulus 
encoding by distinguishing between the nominal stimulus (the physical event 
Presented by the experimenter) and the functional stimulus (the mental 


representation of that event). Of course, the stimulus that is associated with 
the response is the functional stimulus, and Underwood concentrated on 


Situations where the functional stimulus apparently is a selected partial repre- 
sentation of the nominal stimulus, such as the first letter of a three-letter 


Underwood (1963) called a 
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> 5 ; on- 
nonsense syllable or the color of а compound in which there is both a n 
sense syllable and a block of color. 
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ulus containing 


are asked to give the response that was paired with the stim 
ponse for some 


the component. Subjects frequently fail to give the correct res 
components. 

An important set of findings by Harrington (196 
of stimulus components takes place in a systemat 
using compound stimuli, Harrington's experiments presented stimuli whose 
Components fell into two distinct categories. In one experiment, each stimulus 
Was a pair of short words, typed on a single line. The components could be 
Classified according to their position. In another experiment, each stimulus 
Contained a meaningful word and a nonsense trigram. In this case, compo- 
nents could be classified by position (left or right) or by meaningfulness. 
Some subjects in this second experiment had lists in which one component 
in each stimulus was emphasized by use of a colored background. This 
Permitted a third basis of classification: each component was either the 
emphasized or nonemphasized member of its pair. 


The important finding from Harrington's study Was that subjects showed 
in a single category. In the experi- 


4 significant tendency to select elements i . 
Ment where stimuli were word pairs, each individual subject apparently 
Tepresented most of the stimuli with the component seen on one of the sides, 
either left or right. When stimuli combined words and nonsense trigrams 
Without any emphasis, nearly all subjects used the words. But when one of 
the components of each pair was given perceptual emphasis, some subjects 


Used the strategy of coding with the more meaningful component; others 
Used the strategy of coding the emphasized component. : , 
Another finding of some interest regarding stimulus selection was obtained 
by James and Greeno (1967). Groups of subjects studied paired-associate 
lists with compound stimuli. Training Was stopped at varying points for 
different groups and they were given tests ОП stimulus components. Subjects 
Whose training was stopped at or before the point of learning the list failed 
to give the correct response to many of the stimulus components, thus giving 
evidence that they had stored par ons of the stimuli. However, 


tial representati 
ааба upper оце continued beyond a criterion of learning the list gave 
E aining : 
Correct responses to a substantially greater nu 


mber of individual components, 
indicating that during overtraining subjects had stored additional information 
about many stimuli. 


Both Harrington’s (196 
the idea that stimulus selec 


9) showed that the selection 
ic way. As in most studies 


9) and James and Greeno’s (1967) results point to 
tion occurs because of a deliberate learning strategy 
B hie suresh ши. finding indicates that stimulus components are 
selected on the basis of some identifiable classification that can be made. 
eee Socios seri Green's results are most easily explained by assuming that 
the subject restricts attention to а minimal set of stimulus components during 

learned but, after the list is learned, the restrictive 


the time the list is being 
attention is relaxed because the demands of the task are no longer as severe. 
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Critique of Associationist Interpretation 


There is serious question whether 
compatible with the basic ideas of 
stimulus-response associationism ( 
impinges on the subject and that lea 
to the stimulus, The subject could 
Situation, such as alway’ 
this could explain sele 


any process of selective encoding p 
associationist theory. Earlier pese e 
Spence, 1936) assumed that the stimu ~ 
rning affects only what is done in Ps ge 
learn a specific orienting response n x 
5 looking on the right side of a stimulus display, a 
Clive effects based on physical position. cen 
Selective attention to Perceptual cues seems problematic on Ше sees 
response view, as does selection of components based on their mèaningiu А ре 

In a тоге general associationist framework, selective encoding St 
analyzed as the result of perceptual learning. Lawrence (1963) and sont p 
(1955) have argued that perceptual learning can be analyzed as a Do d 
acquiring new stimulus-response connections involving perceptual or E by 
ing responses. On this view, distinct stimuli can be made more similar ia 
8 response to them, and similar stimuli Can 
сипа them to different encoding responses. 4 
Је Associationist view was given by Gibson У 
It is generally reasonable to view perception as the result d 

ich the individual seeks information, rather than к 

gy on the organism (Gibson, 1966). In this frame 
ider perceptual learning asa change in the sensitive 
us kinds of information. rather than in the perceptua 
dual to specific stimuli. Selective attention occurs 
because in Specific stiuations the individual identifies certain kinds of informa 
tion as being especially important, and seeks those kinds of information 
Particularly, Perhaps by use Of specific Stimulus analyzing. mechanisms 
(Sutherland, 1959). If the hypothesis of active stimulus analysis is accepted. 
then Changes in stimulus encoding result from Structural changes in the 
perceptual system rather tl 


han in responses to stimuli, 
reconcile with the associationist position 


Tesponses of the indivi 


ard to 
and thus are hard t 


Cognitive Interpretation of Encoding 

In the theory of associative learning that w 
encoding of stimuli results from analysis by th i 
acquired. This is a form of the th 
features noticed are those for whic 


Processes in the Learning of Associations 67 


fied category. The retrieval system, which 
must begin by testing some feature of each stimulus that appears, will be 
more efficient if the test can be carried out on a component that is easily 
identified. For example, if each stimulus is a pair of words, as in Harrington’s 
(1969) first experiment, a retrieval network using features of the left word for 
some items and the right word for others would be quite inefficient. Each 
time a stimulus is presented, one word or the other must be examined by the 
system; consistently checking the features of the word on one specific side 
of the stimulus pair makes matters much simpler. 

We suspect that the preference for selecting meaningful components when 
both meaningful and nonsense elements are included reflects the greater ease 
in finding a relational encoding of the meaningful component with the 
response. Apparently, this is not a large advantage: a list containing only 
nonsense stimuli is not much harder to learn than a list with only word 
stimuli. However, because a meaningful word corresponds to an entry in the 
Subject’s long-term memory, it provides a relatively rich set of possible rela- 
tionships with other elements that are available for use in memorizing a new 


Association. 


tend to be in some easily identi 


SEQUENCE OF STAGES 
at both response learning and learn- 
hat occur when subjects memorize paired 


associates. We believe, however. that associationist analysis of response 
learning probably is incorrect. And we believe that the need to postulate 
selective perception of stimuli seriously weakens the argument sirat learning 
is basically a process of forming stimulus-response connections. — 

But the involvement of response learning and stimulus encoding in some 
form is undeniable and can be deduced from the nature of the experimental 
task of memorizing associations. In a test ofa subject smemory of an associa- 
tion, the stimulus term is presented and the subject tries to remember the 
response term. In order to give the response correctly, the subject must have 
a representation of the stimulus stored in memory, including enough distinc- 
live properties to avoid confusion with other stimuli in the list. The subject 
also must have information specifying which of the responses goes with the 
presented. stimulus. Moreover. relatively complete information about the 
response must be stored, allowing the subject to perform the response term. 
Of course, other kinds of information may also be stored, including relations 
: mong stimulus terms (cf. Battig, 1968), but there 


We do not disagree with the assertion th 


Ing to encode stimuli are processes t 


М а 

among response terms Oe | i i 

certainly cea be information about the stimulus, information about the 
mation that connects the two elements. 


response, and infor е за 
paired-associate memorizing must make some provision for 


Any theory of : у 5 
the acquisition of all the information needed to perform in the task. Both the 
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associationist theory and the cognitive analysis we have developed here 
include processes that can explain the main facts of performance in paired- 
associate memorizing experiments. However, a deep conceptual difference 


en the response and general context 
n associationist theory, there are many 
d-associate experiment, and formation 
us and response seems to play a relatively 


stimuli (Underwood & Schulz, 1960). I 
Kinds of associative learning in a paire 
of a connection between the stimul 
minor role in the whole process. 
On the other hand, the co 


*.£, Asch, 1968; Horowitz & 
Prytulak, 1969) and has a in a recent theoretical analysis by 
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guishing between the points of view involves identification of processes that 
occur at different stages of the learning process. 
In the cognitive view we favor, a major component of a subject's task in 


associative learning is to organize the stimulus and response into an integrated 


unit. This might be preceded or followed by other processes required by the 
task, but it would not be surprising if the process of storing relational infor- 
mation about the stimulus-response pair were to occur early in the process. 
The sequence of relational storage followed by learning to retrieve would 
agree with conclusions by Estes and DaPolito (1967) and by Kintsch and 


Morris (1965), who found evidence that learning to recognize corresponds to 


a first stage of learning, and learning to recall corresponds to a later stage. 


It also would agree with Polson, Restle, and Polson's (1965) conclusion that 
acquiring the association precedes stimulus discrimination. 

On the other hand, in associationist theory acquiring the connection 
between stimulus and response has generally been expected to occur rather 
late during learning. The feeling that response learning probably precedes 
learning a connection has been made quite explicit, on grounds that a response 
cannot be connected to a stimulus if the response has not yet been learned. 
From the general view that learning involves storage of information, the 
question is whether early stages of learning focus primarily on the encoding 
of information about the response, or whether information about the stimu- 
lus-response pair is involved in the learning process from the outset. 

ion about this question was 


An experimental design that provides informati : 
used in a study carried out by Michael Humphreys. Humphreys taught paired 


associates to subjects; the lists of stimulus-response pairs that he used are 
in Table 4-1. The point of the experiment was à simple one—to vary the 


difficulty of learning by manipulating both the stimuli and the responses. 
In this case, response difficulty related to the ease of pronouncing the non- 
sense trigrams used. The stimulus difficulty involved similarity among the 
stimuli used. The experimental variables were effective. The mean number of 


Table 4-1 Lists Used in Humphreys' Experiment 


Hard Stimuli Hard Stimuli 


Easy Stimuli 
Easy Responses Hard Responses 


Easy Stimuli 


Easy Responses Hard Responses 

|_HPF 1I—RAS ТШЕ 7 
ae 2—1PW 12 МАК св) 
3—GAW 3—NPE 13—JAV a es 
4—RAS 4—GPS 2I BAQ ae 
5—BAQ 5—JPV 22. HAZ cUm 
6—LAN 6—MPA 23 FAC та 
7—DAP 7—BPC 31—DAP ee 
g—JAV 8—XPO eoe 21 RPK 


Source: From Humphreys & Greeno, 1970. 
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errors per item in each group was easy stimuli and responses (EE), 4.3; 
easy stimuli and hard responses (ЕН), 6.6; hard stimuli and easy responses 
(HE), 7.1; hard stimuli and hard responses (HH), 9.5, 

This experiment was conducted to obtai 
effects of stimulus and response difficulty A 
memorizing. If the first stage of learning is mainly a process of acquiring 
Tesponses, then difficulty of learning in the first stage should be due primarily 
to the differences between the two kinds of responses used in the experiment. 
That is, Groups EE and HE, both having easy responses, should have à 
rather easy task in accomplishing the first stage of learning; moreover, the 
first stage should not be much harder for Group HE than for Group EE, 
because these groups differ only in the stimuli to which the responses are to 
be connected. For the same Teasons, Groups EH and HH should find the 
first stage of learning difficult, but both should find it equally difficult. On 
the other hand, if the first stage involves Storing information about the stimu- 
lus-response pair, then it would be expected that both stimuli and responses 
would influence the difficulty of accomplishing the first stage. 

Evidence on the issue can be obtained only if it is possible to measure 


difficulty of learning in the two Stages Separately, a technical problem that 


may not be solvable. One technique, based on the idea that first-stage learning 
is response acquisition, is to count the trials b 


n empirical evidence about the 
in the two stages of paired-associate 


be performed, so this observa- 
€sponse learning, On the other 


are trying to give the correct 
Tesponse for each stimulus, and as Ekstrand (1966) pointed out, probably do 
not give just any Tesponse that comes to mind unless they have some basis 
for giving it to the stimulus being pres umber of trials before 
first occurrence of a response almost Surely overestimates the number involved 
in learning that response se learning as such is occurring. 

A second technique used j studies j 

after a few trials and ask 
Again, observations will b amount of response learning. 
Since subjects must acquire responses in order | 


‘al may not be related to the difficulty of learning the 
responses; therefore this technique may also piv, 
set of results, as we have menti 
Aside from the technical draw itional заада 
methods from our point of view j i Е а ли 
is the main process involved in the first Stage of | м i La Pai 
се 
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appropriate to test this theoretical assumption using a method of obtaining 
Measurements that is neutral with regard to the theoretical question at issue. 


MARKOV ANALYSIS OF STAGES 


A method used by Humphreys and Greeno (1970) in analyzing the results 
of Humphreys’ experiment seems to have some advantages over the earlier 
techniques, although it too has drawbacks. The method uses measurements 
obtained by estimating the parameters of a Markov model that assumes two 
Stages of learning. The main stages of the model are graphed in Figure 3-8, 
and several applications of the model are discussed in the accompanying 
text. 

The Markov model involves no commitment to a theoretical position. It 
Merely says that learning involves two stages, and the two stages occur in 
a specified way. The strong assumptions are that (1) accomplishment of each 
Stage is an all-or-none event, and (2) the stages are sequential and indepen- 
dent; therefore, the second stage cannot occur before the first stage, and the 
Probability of accomplishing the second stage in any given number of trials 
is unaffected by the number of trials it took to accomplish the first stage. 
Investigators who use such techniques as analysis of variance make assump- 
tions that are different in content, but equivalent in their status in the analysis, 
when they assume that variances within experimental conditions are equal, 
and that scores are distributed normally. An important difference is that, 
In using a Markov model, the d 


istribution of scores obtained in the experi- 
ment is used as data to check the assump 


tions of the analysis, rather than as 
something about which an assumption is made. Also, the variances of scores 
obtained in different experimental groups should not generally be equal; 
distributions for each experimental condition can be compared with predicted 
distributions derived from the m 


odel to see whether the assumptions of the 
analysis can be rejected for any о 


f the experimental groups. 
To allow for some irregularities that can occur at the beginning of the 
als when a tr 


experiment and on tria ansition occurs, the model used is slightly 
more complicated than Figure 3-8 shows. State С and State L are the initial 
and terminal learned states of an item, corresponding to Figure 3-8. State 7 
in Figure 3-8 is the state of an item after the first stage of learning has been 
accomplished. According to the model, correct responses never occur as long 
as an item is in State U: correct responses always occur after an item has 
reached State L. But in State /, the correct response occurs with some degree 
of probability. It is convenient for analysis and necessary for some versions 
of the model to specify two states corresponding to the intermediate stage 
of learning: the state called 7 in Figure 3-8 is divided into two states, called 
Band C. State E applies when an item has left State U and an error occurs. 
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? Е ct 
State C applies when an item has not yet reached State L and a E 
Tesponse occurs, More formally, the states of the model are as follows: 


U—the state of an item at the beginning of an experiment, before the first stage 
of learning has been accomplished; Зе Has 

E—the state of an item after the first, but before the second, stage of learning 
been accomplished, on trials when errors occur; hen 

C—the state of an item after the first and before the second Stage, on trials whe 
Correct responses occur; 


L—the state of an item after both stages of learning have been accomplished. 


А * t 
It is assumed that only errors occur in States U and E, and only correc 
Tesponses occur in States C and L. 


rd Ens; Cei Urs 
L,|1 0 0 
Pics E er  — а)д а — фр 0 (4-2) 
С, | с (1 — og (1 — o) 0 


he se ; Я 
along with the first Stage (probability =] ee Stage is not accomplished 


` » th diis e 
intermediate states the Probability of an error is e En On the first trial in th 


ility of co n n 
NEN S after the fi mpleting learning 
plished, if learning is not completed on 


error on the next trial, andp—].. qis the Probabili. the Probability of an 
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When the model is used to obtain measurements of difficulty of the two 
stages, the information needed is in the values of parameters. To measure 
the difficulty of the first stage we need values of 1 — s and a; 1 — 5 is the 
Probability of accomplishing the first stage on the initial trial, and a is the 
Probability of accomplishing the first stage on trials after the initial trial. 
If the first stage is easy, these parameters will be large and few trials will be 
needed for most items to accomplish the first stage. 

To measure the difficulty of the second stage values of b, c, d, and t/(1 — s) 
are needed. The probability of accomplishing the second stage on the initial 
study trial, if the first stage was accomplished on that trial is r/(1 — s). The 
Probabilities of accomplishing the second stage on trials later in learning are 
b, c, and d. If the second stage is easy, then b, c, d, and t/(1 — s) should be 
large; if the second stage is hard, these quantities should be small. 

The task of obtaining numerical values for the quantities in the model is 
the problem of estimating parameter values. Various methods of estimation 
are possible. One that meets nearly all the desirable statistical criteria is the 
method of maximum likelihood (see Restle & Greeno, 1970, chap. 9 fora 
general discussion). It is necessary to express the likelihood of the data as a 
function of the parameters of the model. Recall the discussion for the all-or- 
none model in Chapter 3. To see how the ideas apply to the two-stage model, 


consider the likelihood of a specific sequence for illustration: 
х=1101 10000... 


0 stands for correct response. The sequence 


where 1 stands for an error and ‹ 
е been produced by any of several different 


of observed responses could hav 
Sequences of theoretical states: 


LEESEELLLL.. 


Y 
y, -UESEELLLL.. 
Y,-UU SEELLLL...; 
yeEESBRSILL.; 
y,-UESEESLULL...; 
Y,2UUSEESLLL...; 


y,-EESEESSLL...; 


e Y, has a likelihood that can be calculated directly 


and . Each sequenc iliti 
so on babilities of the Markov model. For 


from the initial and transition pro 


example, 


LY) = sal bje — фри — c)q(1 — d)qd 
= sa(1 — b)e(1 — c)(1 — d)*pq?d. 
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Since Y could have been produced b 
just the sum of the likelihoods of the 


L(X 


У any of the Y, the likelihood of X is 
У In this case, 


)= 01-5 ond — с)р(1 — Фуа 4 5а(1 — b)e(1 — ора — dyq 


id a 

+ s — аја(! — bY1— ey1— exa Фа] а: Uer. 

(Restle & Greeno, 1970, chap, 2, gave a general discussion of derivations 
like this one.) The likelihood of the data from an experiment is the product 
of the likelihoods of all the individual Sequences observed in the experiment, 
assuming that the various sequences Were independent events. Note that this 
uses the simplifying assumption that all Sequences were obtained with the 
same parameter values, Thus, individual differences among subjects and 


ides the basis for the estimation procedure. 
assigned to the parameters would сагеро o 
ata. To estimate Parameters by the metho 
Ose values of Parameters that make the 
known theorems in statistics show that 
se all the information in the data that 15 
relevant to the value of the 5 (technically, the maximum likelihood 
тате Ae fun i ient statistics), and the estimates are as 
efficient as Possible, j ving as small a standard error of estl- 
mate as possible. The maximum value i оой function can not be 
numerical method. Our 
There are nine p del as it is Stated in Equations 4-1 and 
4-2. However, only seven pa mated from the data of an 
experiment, One of the identifiable Parameters js the value of a, useful because 
| i Of the first stage when 
able, which means that 
a ing inferences, especially 
е, n 
of identifiability, see Restle & Greeno, 1970, chap, Ta ees us Ko ду 
of identifiability for this model y У Greens y p ^ ete discuss 
Some hypotheses that simplity the model can be tested in data. One such 
hypothesis is the idea th trial has effe ; ; 
; alli Nas Cts exactly like those 
of later trials. If all item „~ and if the initial study trial is the 
same as later trials, then the Probability of being in State U after the initial 
study trial should be | — a i ine ta Shu puer “he 
initial study trial should be ab, and the Probability of an ebay iie d is 
in the intermediate states should be €. This Can be stated as а relationship 
among the parameters 
t=ab,r пазе oa | — а. 


(4-3) 
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Other simplifying hypotheses that are testable include e = q, and if that is 
accepted, then b — d is also testable. These two equations express the idea 
that the probabilities of completing the second stage of learning and of giving 
a correct response are the same on the first trial after an item leaves State 
U as they are on later trials. 

To determine whether one or more simplifications of the model are accept- 
able, likelihood ratio tests are used. The procedure involves finding maximum 
likelihood estimates of the parameters of the general model, and then finding 
maximum likelihood estimates of the parameters with a restriction imposed. 
The value of the likelihood obtained with the restriction will be lower than 
the maximum likelihood obtained without the restriction, and the ratio of 
the two values (restricted over general) is called 4. If the restricted version is 
correct, the value of —2 108,4 is asymptotically distributed as chi square 
With degrees of freedom equal to the number of restrictions. (General discus- 
sion of this likelihood ratio test is found in many statistics texts, such as 
Wilks, 1962.) Note that restrictions have to be imposed on identifiable 
parameters for an hypothesis to be testable in this or any other way, and the 
degrees of freedom in the chi square test equals the number of restrictions on 
identifiable parameters. For example, the restriction given as Equation 4-3 
involves three of the parameters of the model, but only two of the model's 
identifiable parameters are restricted by Equation 4-3. Thus, in testing that 
hypothesis, the distribution consulted in evaluating the value of —2 10р,4 is 
X^ (2). Another example involves the hypothesis b = 4. If e # q, then Do 
restriction in the identifiable parameters is implied by b = d, and that restric- 
tion is then not testable. But a restriction in identifiable parameters 15 implied 
bye q, and ife — 4 is accepted, then b = d does impose a restriction on 


identifiable parameters. 


Tests of simplifying restrictions such as Equation 4-3. e — q. and b = d 


represent preliminary work with the model. The main analyses involve tests 
of significance comparing different experimental conditions in the difficulty 
of the two stages of learning. Likelihood ratio tests are also used in these 
analyses, Suppose, for example. that we want 10 test whether two groups 
differ in the value of a. A maximum likelihood value is obtained for all the 
data of both groups, with all parameters free to vary. A second maximum 
likelihood value is obtained with a single value of a used for both sets of 
data. The restricted value of the likelihood divided bythe maximum likelihood 
without the restriction gives à likelihood ratio 2. In this case —21og, is 
asymptotically distributed as chi square with one degree of freedom if the 
two groups really have the same value of a. Tests can be carried out using 
more than one parameter, and the degrees of freedom for the chi square 
distribution equal the number of parameters involved in the test. In this way, 
we can test whether two groups differed in the difficulty of the first stage of 
learning, or in the difficulty of the second stage of learning. or in performance 
during the intermediate stage of the learning process, or in some combination 
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. of 
d be noted that these tests, like the rad 
1 Possible only when the hypotheses being ee 
1 2 
ітроѕе restrictions on the identifiable Parameters of the groups being te 


j stric- 
and the appropriate degrees of freedom of a test equal the number of re 
tions imposed on identifiable Parameters. 


ANALYSIS OF HUMPHREYS’ EXPERIMENT 
Each of the lists shown in Table 4-1 w 


Sequences in each condition for analysi 
if a criterion of five consecutive correc 


; iving 144 
as learned by 18 subjects, giving vee 
5. Each item was considered learn 
t responses was given. 


Simplifying assumptions 


| Sots S 
П consider the initial study trial "t 
further hypothesis i 


any group: However, note for future 
S restriction for al] the groups in whose 
data that assumption could be tested, 


Table 4-2 Results of Testing Simplifying Assumptions 


Hypothesis: 1 = ab, 


Stimuli Responses , — &S=1li@ ron А чая 
—2 log. 2 p —2 log, 1 p —2log,A р 
Easy Easy 1.30 25 .81 65 
Hard Easy зи 07 )u v E - 
Easy Hard 199 “16 107 29 244 12 
Hard Hard 1.81 18 10.13 ‘002 aa у 


Source: From Humphreys & Greeno, 1970. 
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Goodness of fit 


The next question is whether the model fit these data well enough for the 
parameter values to be meaningful. As was mentioned earlier, goodness of 
fit can be checked using the distributions of statistics taken from the data. 
Several statistical properties of data were examined by Humphreys and 
Greeno. Two that are easy to understand intuitively are the number of trials 
and the number of errors occurring after the first correct response. In the 
model, the first correct response can not occur until the item has left State U. 
This means that when the first correct response occurs, the item either is in 
State L already, or there is just one stage of learning left to be accomplished. 
This leads to the rather strong prediction that if only data after the first 
Correct response are considered, these data will have properties like those of 
all-or-none learning. Let X be the number of errors after the first correct 
Tesponse, and let Y be the number of trials between the first correct response 
and the beginning of the criterion string of correct responses at the end. From 
the model, it can be shown that X and Y have the same form as the number 
of errors and trial of last error in all-or- given in Equations 


3-1 and 3-2, That is, 


none learning, 


I 
; 4-4 
feme | —3ü—4w 1: me 


A k=0 
4-5 
s = = [sa – ид "а к21, gi 


v=qt+Pe and z is a rather complicated 
f the model. (See Greeno, 1968, for details.) 
r the four experimental groups are shown 


in Figures 4-1 and 4-2. The data are given as histograms, and the theoretical 
distributions based on parameters estimated separately for the four groups 
àre given by the connected dots. A suitable statistical test of goodness of fit 
uses the chi square statistic. Frequencies were pooled in adjacent points of 
the distribution to obtain cells that have theoretical frequencies of at least 
5.0. Then the goodness-of-fit chi square statistic was calculated in the usual 
Way, Degrees of freedom for the test are not well defined, because estimates of 
Parameters are taken from all of the data, rather than from the specific fre- 
Quency distributions used in the tests. HON е ео obs by ма 
and Lehmann (1954) states that the distribution of the chi square statistic is 

2(n — 1 — m) where n is the number of cells in 


Ба п — 1) and x 
nded by x^ is the number of parameters estimated. 


the distribution (after pooling) and m а 
То compute the theoretical distribution of X, two quantities are неее а 


and и. Similarly, calculation of the distribution of Y uses two theoretical 
values, z and uv. Thus, for these two statistics, the bounds on the distribution 


Where u = (pe + qd)i(g + РО), 
function of all the parameters © 
The distributions of X and Y fo 
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of the chi square statistic are y2(” — 1) and Xn — 3), where n is the number 


of cells in the pooled frequency distribution. As Figures 4-1 and 4-2 show, 
the data agreed quite well with the theoretical distributions for both X and Y. 
In only one of the eight distributions tested was the discrepancy large enough 
to Produce statistical significance, and even in that case the form of the 
empirical distribution seems to follow the general pattern of decreasing fre- 
quencies predicted by the all-or-none model. It seems reasonable to conclude 
that learning that occurred after the first correct response can be described 
Well as an all-or-none process. 

Figure 4-3 shows the distributions obtained for errors before the first 
Correct response. The upper panel shows frequencies of sequences having no 
errors after the first correct response, and the lower panel shows frequencies 
Of sequences with one or more errors after the first correct response. The 
goodness-of-fit chi square statistics were calculated by first pooling adjacent 
Cells to obtain distributions that had theoretical frequencies of at least 5.0, 
and then summing the terms of the chi square statistic based on deviations 
'n both of the component distributions. Note that the data agreed well with 
the theoretical distributions for groups HE, EH, and HH. For group EE the 
discrepancy between empirical and theoretical distributions may have been 
large enough to reject the null hypothesis at the .05 level, but it is not definite 
because of the uncertainty about degrees of freedom. 
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ent between the data and the predic- 
tions based on the model to make it reasonable to use the model in measuring 
the difficulty of the two stages of learning. 


Tests of Parameter In variance 
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error in Humphreys' experiment (Humphre 


of —2 log, 1, but the degrees of freedom for the tests are not well understood 
M exact restrictions imposed on identifiable parameters by the 
Fis nptions of invariance have not been worked out. A test for invariance 
just one of these parameters could not be carried out. Any one of these 
Ели could ђе set arbitrarily for one of the groups, within limits, and 
he remaining parameters could be adjusted to compensate. The numbers 
given as degrees of freedom in rows of Table 4-3, other than the first row, are 
Опе fewer than the number of parameters tested, and this probably isa 
Teasonable guess as to the appropriate degrees of freedom. The general 
Conclusion is that most hypotheses involving invariance of b, c, and d between 
these pairs of groups apparently are acceptable by the usual statistical criteria. 


Since little or no difference appeared between pairs of groups in the second- 
Humphreys and Greeno tested invariance of 


н learning parameters, Е 
ese parameters across all groups. The hypothesis that b, с, and d were all 
Constant over all four groups probably involves 5 degrees of freedom. There 
are 20 identifiable parameters for the four groups, and under the restriction 
there are three parameters with four values each (a, e, and q), along with 
arameter Invariance between Pairs of Groups 


Table 4-3 Tests of P 


Invariant — Degrees of 
Parameters Freedom Е, НЕ ЕН, НН ЕЕ, ЕН НЕ, НН 
———97(015) 6690010) 14.23(.0002) 1623.000 
"4 1 1.2128) 0977) „47(.50) ‘OO 0) d 
2 1 28633) 01095 25662) .00(1.0) 
ad 1 (02089) 5.340.021) .00(1.0) .30(.59) 
its 0216559) _ 538008) .74(.70) (69.71) 


Sources for Humphreys & Greeno, 1970. 
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Single values of b, c, and 
р < .02). 


When pairs of the second-stage parameters w ere held constant, the ila i 
Ses were all acceptable in the data. Partial results are given in Tis n 
Which shows the values estimated for the parameters held constant in us 
€ test statistics, the significance level at which the hypot cd 
Ses could be rejected, and the estimates obtained for the remaining wee 
Stage learning Parameter. The fact of interest in Table 4-4 is that when т 
two second-stage parameters were held constant, the estimates obtained 
the remaining parameter were relatively close 
the same stimuli (EE-EH and HE-HH), but di 
for groups differing in stimuli (EE-HE 
these parameters interact, the result in 
Stage parameters may have depended 
constant over the gro 
this reasonable are in Table 4. 


: Spee 4, 
d. The hypothesis can be rejected (025) = 14.1 


to cach other for groups having 
ferences were somewhat aie 
and EH-HH). Since the estimates 
Table 4-4 suggested that the apa 
mainly on the stimuli, being fary 
1 responses. Further findings on 
-3, where all hypotheses involving ipii 
; EH and groups HE, HH were acceptable. 

sis olving groups differing only in response 
was rejected (c and dir 


stage parameters depended only on a 
Feeno tested the hypothesis that b, c, and d МИ 
i uli. This is a testable hypothesi 
alues of 5, с, and d, there are H 
20 identifiable parameters. a. 
t a stronger hypothesis, using th 
acceptable hypothesis in the three experimental groups 
Sis Was testable (recall Table 4-2). Using the hypothesis 
ame, that their common value depended only on stimuli, 
and that c depended onl © are 16 free parameters, HOW- 


ever, € was also restri ve been studied in connection with 
other questions about the learning Process (see Greeno, 1967). If it is assumed 
that the second stag hen the subject failed to respond cor- 
rectly in the interme completion of the second stage 
occurred with equa i 


rials in the intermediate state, then 


Table 4-4 Tests of Parameter Invariance Over All Groups 


Invariant Varying = = 
Parameters 2 log. 2 p Parameter Бр E ae и“ HH 
5.26), c(.00) 3.95 -14 d -36 28 25 
bC16), d(.22) 2.03 37 g 19 a E 106 
с(.18), d(.17) 5.62 :06 b i 
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Table 4-5 Parameter Estimates Assuming b = d and Second Stage Dependent 
Only on Stimuli 


EH HE HH 


Hypothesis Parameter EE 
b=c=d a 29 18 22 13 
с b=c=d 20 20 15 15 
c e 55 34 68 00 
c d 43 55 52 .57 
d, a 29 18 21 13 
b=d 34 34 26 26 
е .68 38 66 ло 
q 54 64 60 64 


Source: From Humphreys & Greeno, 1970. 
€ = d. Either of these hypotheses makes the model identifiable for a single 
group, when it is added to the restriction t = ab,r — e, 

Results for both tests were positive. With either identifying restriction 
there are 14 free parameters, giving 6 degrees of freedom. With b = c — d, 
the value of —2 log. 2 was 8.44, which corresponds to a probability of .21 
in y*(6). Testing NI d with c = 0 gave —2 log, 4 = 4.37, р = .62. Table 
4-5 gives the parameter values estimated under the hypotheses. (Recall that 


Only two values of b = d were allowed under the hypotheses.) 


s=1—a. 


THEORETICAL IMPLICATIONS 


The purpose of obtaining measurements of difficulty in the two stages was 
to obtain evidence on the question of what the main processes are that occur 
in each of the two stages of learning. Keep 1n mind thata CARES IN the 

a; Is in which the probability of correct 


arkov model involves а series of tria uet 
Tesponse remains relatively constant. The probability of correct response 


Starts at zero, then at some later time changes to p, an estimated parameter. 
Y estimating a and s (the first-stage parameters) 


The mea btained by 
surement o ing that change in response probability. The 


is the difficulty of accomplish f 
measurement теа by estimating b, c. d, and t/(1 — s) (the second-stage 


Parameters) is the difficulty of accomplishing the change in probability from 


Pto 

Бате result was that both stimulus and response variables 
affected the difficulty of the first of these aperahonally цай stages ar 
(eM iw mie affir hand, the results were consistent with the hypothesis 
that the second stage depended only 9n proper Or Thee tinuli, 

The simplest interpretation ofthe finom wouid be that'storage odd. 
арене рии occurred in the first stage of learning, and that the associa- 
АЛЕ reliably retrievable, mainly through stimulus discrimination, 


in the second stage- This interpretation fits well with the Gestalt idea that 
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arning occurs later if it is needed to make the lu aa 
tion easy to retrieve, We assume that learning to retrieve an item FONDS 
incorporating it into a retrieval network of the kind discussed in Chapter ЈЕ 
In carrying out their analysis, Humphreys and Greeno had expected s 
result consistent with a Simple application of associationist theory. It wa 
expected that the first-stage parameters would largely reflect the process e 
Tesponse learning, and that the second-stage parameters would be related E 
the process of learning associative connections. According to this view, it is 
mainly on the response variable, perhaps ds 
€qual in groups EE and HE, and having a different single value in ine 
ge parameters were expected to depend QUE bot 
riables, since both elements are involved in m 
Of course, this expected pattern of results Wa 


2, it is not as simple as the Le 
phreys and Greeno. It could be assumed pis 
Tesponse acquisition occurs early in the learning process, as associationi$ 
theorists have hypothesized. But the accomplishment of response learning 


ay : ~ ether 
need not in itself produce a change in response Probability. It can be furthe 
assumed that a certain amount of 


associative strength needs to be built UP 
before the Probability of Correct response changes from zero. According tO 
this view of things, the learning that occurs in the first stage of the Markov 
model would include response acquisition and the beginning of associative 
learning. The second s 


С stage of the Markoy model would correspond to the 
completion of associative learning. If this interpretation is adopted, the dat 


in this experiment indicate that the difficulty Of associative learning depended 


mainly on stimulus Similarity and did not seem to depend on response 
difficulty, 


This associationist interpretation 
inconsistent with the model used in 
model, learning Occurred in 
tionist interpretation given 


of Humph rey 


analyzing th 
two discret 
here Says th 


^ sults 15 
5 and Greeno's results ! 
€ data. According to the 
* and sequential Steps. The associa- 


4 ate atin the first Stage at least two things 
are happening: acquisition acquisition of some associative 


strength. While this theory is contradicted by the model used to analyze the 
data, it is not necessarily contradicted by the data. Even though the data 
were consistent with the discrete two-stage model, it js known that more 
complicated processes can generate data in approximate agreement with 
predictions from a Markov model (Heine, 1970), especially if provision is 


made for differences in learning difficulty among subjects and items (Restle, 
1965). 
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foe that contradicted the simple associationist idea about stages 
Pe Mines contradicted the expectations with which Humphreys and 
е евап their analysis. The interpretation they developed, which 
ticae storing representations of pairs followed by learning to retrieve, 
in s sponse to the empirical findings. The advantage of the stages specified 

Cognitive interpretation is that they correspond in a rather direct way 
to the states of a statistical model that agrees well with data and whose 
Ae Digs can be estimated for measurement of the difficulty of stages of 
earning. This technical advantage probably should not be interpreted as 
evidence for the cognitive view or against the associationist interpretation, 
except insofar as the cognitive view seems to provide a simpler analysis of 
the results, а 


REPLICATION OF HUMPHREYS’ EXPERIMENT 
h the assistance of Herbert 
n Humphreys’ experiment. 
) or overlapping letter 


An experiment was conducted by Greeno witl 


Мав, to check the pattern of results obtained i 
n the replication, stimuli were either letters (easy 


Pairs (hard), and responses were either high-frequency words (easy) or 
are shown in Table 4-6. In this study, 


low-frequency words (hard). The lists ds 
the stimulus variable had a greater effect on overall difficulty than the 
response variable. The mean total errors per item in the four groups were: 
EE, 4.25, EH, 5.09, HE, 10.63, and HH, 11.02. 

Tests of simplifying assumptions led to acceptance of = d and e = q for 
groups EE and EH, although the assumption about the initial study trial, 
Equation 4-3, could not be accepted for these groups. On the other hand, 


Equation 4-3 was acceptable for groups HE and HH, while b = d was not. 
a was not às good as with Humphreys’ 


The fit of the model to these dat vith 
study, but it seemed tolerably good for purposes of a replication. 

Because different simplifying assumptions were acceptable for the various 
groups, direct comparisons of parameter values do not give a very meaningful 
Picture of the relative difficulty of states. For example, in group EE, the 
Parameter а was estimated to be .11, while in group HE a was .26. However, 
ed in Replication of Humphreys' Experiment 


Table 4-6 Lists US 


rcs NES -—— BH HE HH 
"E р рай FQ—Touch FQ— Delft 
um en V— Blear ME we VF—Renal 

F—Grain F—Renal № Q—Grain УО Anode 
C—Stand C—MHourt ОУ— Stand QV— Houri 
L—Earth L—Ingot QF—Earth QF—Ingot 
S—Anode Е FV—Offer FV—Blear 


S—Offer 


Source: From Greene 1970. 
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it would be a mistake to conclude that the first stage was harder for group 
EE, because 1 — s, the probability of accomplishing the first stage in the 
initial study trial, was .94 in group EE, while 1 — s was .26 in group HE 
under the simplifying assumption used for that group. Summary measures of 
difficulty of the two stages can be obtained by calculating from the theoretical 
parameters. Let E(Z,) be the mean number of trials before the first stage of 
learning is accomplished, and let E(Z,) be the mean number of trials after 
the first stage is accomplished but before the second stage is accomplished. 
In terms of the states of the model, £(Z,) is the mean number of trials (includ- 
ing the initial study trial) spent in State U, and E(Z,) is the mean number of 
trials spent in the intermediate States E and C. The value of E(Z,) is straight- 


forward: 
E(Z,) = 1 + s/a. (4-6) 


The value of E(Z,) depends on the initial probabilities and on weighted 
averages of the second-stage learning parameters. Let R — rd + (1 — r)c, 
E= ed + (1 — e)c, Q = qd + pe. Then 


E(Z3) = (1 — s — (DR + s(1 — b)E 


+ (! zea 5 — DU — R) + ха — ay — Ey. (4-7) 


Use of Equations 4-6 and 4-7 requires that identifying restrictions be 


employed to allow empirical estimates to determine parameter values. 
Greeno (1970) used the assumption с = 0 in calculating values from the data 
of this replication. The results are shown in Table 4-7, and they seem to 
confirm the pattern of results from Humphreys’ study quite nicely, in that 
the estimated difficulty of the first stage is affected by both stimulus similarity 


and response frequency, and the difficulty of the Second stage essentially 
is dependent only on the stimulus variable. 


PAGEL'S MEASURES OF EFFECTS OF 
MEANINGFULNESS AND SIMILARITY 


as all-or-none in her experi- 
ment. A likelihood ratio test is Possible, since the all-or-none пода js a 
special case of the two-stage model. Thus, the all or-none hypothesis can be 
used as the null hypothesis of a st the two-stage model being 


atistical test, with 
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Table 4-7 Measures of Difficulty for Stages of Learning in Replication of 
Humphreys’ Experiment 


Measure Group 
EE EH HE HH 
E(Z,)—First Stage 1.49 2.55 3.90 5.51 
E(Z)—Second Stage 3.00 3.34 9.75 8.51 


Source: From Greeno, 1970. 


the alternative hypothesis. Pagel tested this assumption in its most general 
form, using the unrestricted form of the two-stage model which has seven 
identifiable parameters, and a general form of the all-or-none model which 
has free initial probabilities of the states giving four identifiable parameters. 
The results showed clearly that the all-or-none model should be rejected as 
à description of learning in Pagel's experiment. There were four experimental 
groups, and the values of —2 log, 4 for the likelihood ratio test summed to 
402.45, With 3 degrees of freedom per group, that statistic should have been 
distributed approximately as 02012) if learning had really been all-or-none. 
The smallest of the four values was 63) = 10.31, p < 05. 

The fact that learning required two stages fits nicely with the hypothesis 
that the main stages are storing pairs and learning to retrieve the stored 
representations. The stimulus materials used by Pagel made discrimination 
quite difficult in some of the conditions, and would be expected to require a 
Second stage of some difficulty. Pagel's result thus agrees with Polson, Restle, 
and Polson's (1965) finding of two-stage learning in conditions where stimulus 
discrimination is difficult. As a general rule, the conditions needed to give 
data in agreement with the all-or-none model include having a small number 
Of responses (two or three), as well as a relatively short list of stimuli that are 
quite distinctive. With only à dozen or so items, and with just two or three 
responses, the task is quite similar to a sorting task, in which the subject 
must learn only in which category each stimulus belongs. It seems intuitively 
reasonable that problems of information retrieval would be minimal in that 
kind of task, therefore, it seems consistent with storage-and-retrieval notions 
that áll-ai-aone learning should occur in the kinds of experiments where it 
has been found, and not in situations like Pagel's, where response learning 
as such is no problem but where stimulus conditions produce difficulty in 
retrieving stored information. | . | | 

The four groups in Pagel’s experiment involved factorial variation of 
stimulus meaningfulness and similarity; examples of the stimulus lists are 
given in Table 4-8, along with the theoretical measurements of difficulty. 
E(Z,) and E(Z;) are given (from Equations 4-6 and 4-7), because some sim- 
plifying assumptions were not acceptable in all groups. Note that both stages 
were apparently affected substantially by the varied stimulus properties. 
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Table 4-8 Lists and Measurements of Difficulty from Pagel's Experiment 


High Meaning Low Meaning High Meaning Low Meaning 
Low Similarity Low Similarity High Similarity High Similrity 
Law Gac Ban Yaj 
Jam Laj Bin Yij 
Get Yeb Pan Zaj 
Bed Xed Pin Zij 
Six Kih Bat Yaf 
Kin Riw Bit Yif 
Fur Zuf Pat Zaf 
E(Z) 116 1.44 2.26 5.26 
Е(2) 1m 2.59 7.73 14.66 


Furthermore, when stimuli were highly similar, meaningfulness played an 
important role in difficulty of both stages. Recall from discussion earlier in 
this chapter that the usual small effect of stimulus meaningfulness can be 
explained by assuming that subjects encode partial representations of stimuli. 
With highly similar stimuli like those used by Pagel, partial encodings are 
not sufficient to distinguish the stimuli. In this case, when more complete 
representations of stimuli must be Stored, meaningfulness of stimuli appar- 


ently plays a Strong role in determining how easy it is both to store and to 
make retrievable representations of the associations, 


HUMPHREYS AND YUILLE'S MEASURES OF 
EFFECTS OF CONCRETENESS 


Another set of me fliculty in two learning stages was 
obtained by Humphreys and Yuille (1971). The variable manipulated was the 
hich words can be related to a picto- 
у 5 rated for concreteness by 
г. Par » and Madigan (1968), In the experiment, concrete- 
ness of stimuli and responses was varied factorially, with 25 pairs being 
learned by each subject. 

An Important result obtaine YS and Yuille was that with 
concrete stimuli, learning occur ially an all-or-none fashion, with 
a very low probability of Correct response (about -10) for unlearned items. 
The two-stage model was requ i i bstract words as stimuli. 
One interpretation is that conc : 


2 Crete stimuli had only one 
cond stage occurred with Probability close to 


€ inference that retrieval is 
fortable Опе since a concrete 
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word on the stimulus side would be expected to make easy contact with a 
relational encoding of the pair. 

The parameters estimated from Humphreys and Yuille’s data are given in 
Table 4-9. Values of a estimated for groups with concrete stimuli are the 
estimated learning rates obtained from the all-or-none model. The effects of 
concreteness on the first stage were small and not significant. But strong 
effects of both stimulus and response concreteness were obtained in the second 
Stage. On the interpretation given here, concreteness of stimuli had a very 
large effect, raising the probability of accomplishing the second stage essen- 
tially to 1.0, A significant effect of response concreteness was also obtained 
when stimuli were abstract. (The parameters given in Table 4-9 were obtained 
with the identifying restriction ¢ = d, which seems reasonable since, in 
Humphreys and Yuille’s experiment, subjects studied all the pairs, then were 
tested on all the pairs; they did not have a study trial on each item immedi- 
ately after its test as they would in the anticipation procedure. The significant 
effect that was obtained involved the quantity и = ска + pc), which was 
different in groups Abstract-Concrete and Abstract-Abstract, —2 log, д = 


13.63, df = 1, p < .001. 


Manipulations of response difficulty by Humphreys and Greeno (1970) 


and in Greeno’s replication of Humphreys’ experiment failed to produce 
changes in difficulty of the second stage of learning, and this lack of effect 
seems to support the idea that the second stage of learning is a process of 
making stored representations more retrievable. Humphreys and Yuille's 
result is consistent with that interpretation, especially if it is accepted that 
stimulus concreteness had a stronger effect on the same process. Here the 
involvement of response concreteness occurred when stimuli were words, 
rather than numerals or letters and letter pairs, whereas in the earlier studies, 
response difficulty had effects only in the first stage. A major assumption of 
the cognitive analysis of association 15 that the encoded representation of 
à stimulus-response pair is relational, affected by properties of both terms. 


The finding that concreteness of responses affects ease of retrieval can be 
| as an indication that abstract stimulus words were encoded in 


interpreted as : 4 
them better retrieval cues when the subject was encoding 


ways that made 


Table 4-9 Parameter Values Estimated from Humphreys and Yuille's Experiment 


Stim. Concrete Stim. Abstract Stim. Abstract Stim. 


Concrete 
Parameter Concrete Resp. Abstract Resp. Concrete Resp. Abstract Resp. 
P T ees а a m 
a (First Stage 34 0 E 37 
Learning) = T 
с (Second Stage ~1.0 =1.0 42 33 
Learning) 
earning e „У ds ds 


р (Intermediate 
State Performance) 


EFFECTS ОҒ STIMULUS AND RESPONSE 
MEANINGFULNESs 


й ине: деа ur 
Intralist similarity. E 
em го 
as both stimuli апа responses; i 
nonsense trigrams as responses: one with nonse 


following: Tyz-Nail, Daq-Flag, Vec-Dish, Byl-Goat, Gax-Pipe, Rug-Wings 
Hyj-Leaf, Mef-Yard. 


io sae he items Were presented by = 
anticipation method; that 15, the stimulus was shown while subjects type 


responses on the keyboards t to the computer, or merely 


cate they did not know. After all subjects had 
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simplifying assumptions that were statistically unacceptable, but none with 
р < 001. The simplifying assumptions used were (1) £ = ab, = 1 — а, 
r = e, and (2) b = d, e = q. In two of the groups estimates will be presented 
under both of these restrictions. In the group with word stimuli and word 
responses, both restrictions were acceptable, p > .10, while in the group with 
nonsense stimuli and word responses, both restrictions were statistically 
unacceptable, .001 < p < .01. The results are shown in Table 4-10. All the 
estimates shown use the identifying restriction ¢ = d. 

The outcome appears to show that meaningfulness of both stimuli and 
responses had effects on first-stage difficulty, although the effect of response 
meaningfulness appears to have been somewhat greater. This agrees with the 
idea that subjects must encode the entire response but they can use a simpler 
encoding for nonsense stimuli, perhaps using only one letter instead ofremem- 
bering the entire trigram. On the other hand, these data show a somewhat 
greater effect of stimulus meaningfulness on the first stage than Pagel's results 
did in the condition where stimuli were dissimilar. Effects on the first stage 
Were approximately additive—that is, using nonsense responses rather than 
words added about as many trials to the first stage when stimuli were words 
as when stimuli were nonsense trigrams. . Ў 

Meaningfulness appears to have a much different kind of effect on the 
second stage. Differences between conditions having words either as stimuli 
Or responses or both were small but, when both stimuli and responses were 
nonsense, the time needed to accomplish the second stage was approximately 
doubled. The point taken from Humphreys and Yuille 5 study—that difficulty 
of learning to retrieve can be a joint function of stimuli and responses—is 
made stronger by these estimates. One interpretation is that when either of 
the terms of an association is à meaningful item for the subject, subjects can 
form an organized system for retrieving the items, based on the meanings 

ons involving other words in memory. Only when 


of the items and associati д у 
both items are meaningless does the effect on difficulty of retrieval become 


Strong. 
| Table 4-10 Estimates of Difficulty in Stages of Memorizing Associations 
First Stage: Second Stage: 
Stimuli Responses Е(21) E(Z;) 
t=ab mee 1 
s m s-1—a bed 
= eru ree drug 
1.42 1.51 2.24 245 
Word Word 
Nonsense Word 3.01* 2.14* 2.46* 281* 
Word Nonsense 4.50* 245* 
Nonsense Nonsense 0 5.49 


*denotes a simplifying restriction with .001 < p < .01. 
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SUMMARY AND CONCLUSIONS 


we discussed the formation of complex ideas, an important issue in motivating 
the development of associationist theory. We believe that complex ideas are 
Probably relational Structures, rather than bundles of component ideas 
linked by undifferentiated connections, a view that is inconsistent with the 
classica] associationist analysis, which required that complex ideas develop 
through the simple connection of elementary components. We then conclude 
that the associationist analysis fails to accomplish the goal of explaining the 
development of complex ideas from a System in which only sensory elements 
are available initially and only undifferentiateg connections are built see 
them, A more sophisticated System, capable of comprehending relations an 
generating relational Structures, is apparently needed. 

Although rote memory was originally studied b 


ecent analyses have been influenced by behaviorist 
associations as connections 
rk, the process of association 
ask of paired-associate memorizing: 
ually a single process, and we have 


between stimuli and responses. In this framewo 
has been Studied by using the laboratory t 
It has been Tecognized that this is not us 
reviewed hypotheses about components 


ote memorization of lists, it 
ecause memorizing an association requires complete 
asier when the responses are already 
9r when there is Opportunity for practice on the 
responses before the Paired associates are Presented, However, these facts 
do not imply that learning a response and learning а connection between 
that response and a stimulus are separate Processes, A More plausible alterna- 
live is that the main task is to store j 

that includes sufficient informat; 

Next, we discussed sti 
Stimuli have indicated that Subjects often « T€ partial representations of 
stimuli in memory and, like Other ph attention, this 
Posed that retrieval 
stimulus features, such as an 
al to Expect parti 


fact is problematic for associationist theory 
of associations occurs through an analysis of 
EPAM net. In this kind of system it is natur: 
of stimuli, omitting stimulus features not n 
also be expected that selected components would tend to c. 
easily identified category, making retrieva 

A major effort in this chapter was the pr 


ome from a single, 
ficient. 
sentation of results obtained when 
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a two-stage Markov model is used to analyze acquisition of paired associates. 
According to the model, the process of learning each individual item consists 
of two transitions. When the first transition occurs, the probability of a 
Correct response changes from zero to p, a parameter that is estimated from 
data. When the second transition occurs, the probability of correct response 
becomes unity. 

We have interpreted the findings as supporting the idea that in the first 
Stage a representation of the stimulus-response association is stored in mem- 
Ory; further learning that is accomplished in the second stage supports 
reliable and efficient retrieval of the association. The main assertion of the 
Gestalt analysis of association is that the process involves formation of new 
mental units consisting of relational structures having the associated elements 
as components. We identify this process of finding a relation that integrates 
the stimulus and response terms as the first stage of learning. The representa- 
tion of the association must include a complete representation of the response 
term, sufficiently organized to permit production of the response. However, 
We view the integration of the response not as a process separate from associa- 
tion, but asa part of the subject's task in forming an organized representation 


of the stimulus-response pair. : Mr 
We suppose that once à representation of the association is stored in 
memory, there is some probability that the items can be retrieved on a test. 


This probability depends on the situation. It can be very nearly 1.0, and when 


it is, learning is approximately an all-or-none process. 
f retrieval after storage of a representation 


More often, the probability o di ; 4 
is substantially below 1.0, and further learning is required before the item 
becomes sufficiently retrievable to meet the experimental criterion of learning. 
We assume that the second stage of learning consists of incorporating the 
item into a network of feature tests that eventually permits efficient retrieval 


of all the items in the list. 
item applying the Markov model to paired- 


The major findings obtained by ) MU | 
associate data seem quite compatible with this interpretation of the stages 
Ssocia ata 


represented in the model. First, the difficulty of accomplishing the first stage 
of learning was strongly influenced by properties of both the stimulus and 
response terri supporting the hypothesis that the first stage in learning a 

ing a relational representation of the stimulus- 


на cS ЈЕ ng 
new association Inv olves storing А nes 
ара айе Next, large differences in the difficulty of the second stage were 


produced by varying stimulus similarity, in agreement with the idea that the 
second stage consists of learning to retrieve stored items reliably, especially 
if retrieval learning i5 assumed to consist of incorporating an item into an 
eal network composed of tests of stimulus features. In Humph- 
reys and Yuille's experiment, pairs with concrete words as stimuli were 
apparently retrieved with sufficient ease so that a second stage of learning was 


not required Apparently the semantic features of concrete words provide for 
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efficient retrieval—either because they are trivially easy to form into a retrieval 
network or because the individual associations are distinctive and accessible 
enough so that little or no interitem structure is needed for retrieval. 

Meaningfulness of stimuli usually has very small effects on difficulty of 

paired-associate memorizing, but in Pagel's experiment, stimulus meaning- 
fulness, when stimuli were very similar, had a substantial effect on difficulty 
of the second stage. We interpret this to indicate that when a nearly complete 
representation is needed to make stimuli distinguishable, there is a strong 
advantage in having stimuli familiar enough to allow the subject to recognize 
them as units. Features of a familiar word have previously been used by the 
subject to identify the word in many contexts; therefore, it should not be 
surprising that the subject can more easily incorporate those features into 
a retrieval network than the features of an unfamiliar string of letters. 

Two variables concerning response difficulty that had marked effects on 

the first stage of learning had little or no effect on the second stage. The 
pronounceability of nonsense responses in Humphreys’ experiment and the 
word frequency of the relatively abstract responses used in Greeno’s replica- 
tion of Humphreys’ study had negligible effects on the second stage, and these 
results agree with the idea that difficulty of learning to retrieve should depend 
mainly on characteristics of the stimuli. However, in Humphreys and Yuille’s 
study, concreteness of responses had a small effect on the second stage and 
in Greeno’s transfer experiment, meaningfulness of responses in the first 
lists learned made a sizable difference in the second stage of learning when 
stimuli were nonsense, but not when stimuli were words. 

We believe that an interpretation can be given for the effects of response 
meaningfulness and concreteness on the ease of learning to retrieve. The effect 
of response meaningfulness in Greeno’s experiment interacted strongly with 
the effect of stimulus meaningfulness. Compared with pairs of words, neither 
word-nonsense pairs nor nonsense-word Pairs were substantially harder in 
Cnr conjecture would be Cat ence era Tent deal ere 
a word, the initial encoding of the EE bees 1 пен мек: was 
the trigram with a word рећи conde edd meine tae ert 
of semantic features of the words that и = bei: ees гана и 
storing the trigram-word pairs substantiall acsi ated Sock sini iio 

Y greater contact between trigrams 
and word schemata were made, then the stimuli in that condition would be 
effectively more meaningful, and their advantage over tri ram-t i m pairs 
in the second stage could be explained by the availability * sem Mese кин 
or by the subject's familiarity with retrieval of the words that voe e orated 
in the associative representation. Response concreteness had mu hl с есі 
оп retrievability, and also could be indirect] EU 


y produ i i 
relational encodings caused by the nature uf ts eo difference in the 


chapter 5 


Positive 
Transfer 


of Association 


This chapter is concerned with categorical concepts as analyzed in 
Associationist theory. We begin with a brief discussion of the general problem 
of abstract ideas, then deal with positive transfer when different associations 
include members of the same categorical concept and when new associations 
are learned more easily because the learner can identify the concept the vari- 


Ous items share. 


ABSTRACT IDEAS 
sociationist theory explained how persons acquired 
deas. According to one idea, an abstract concept 
ncept made up by combining complex concepts. 
For example, the concept “dog” is the combination of many properties; so 
too are the concepts “cat,” “squirrel,” and lion. Beyond this level, the 
concept “animal” combines all the properties 1n the concepts of its subsets. 
or bundle, theory, abstract ideas occur through a 


In this combination. Ino і 1 
process of still further association—association among properties that have 


already been associated in concepts at a lower level. 

The other idea assumes à process of generalization: an abstract concept 
consists of those properties that are shared by the ideas that are its special 
cases. In the generalization theory, an inverse process of association occurs 

properties of a concept are identified and separated fram 


in which the relevant 
the respective lower-level concepts to form more sbstraceddens: 


The traditional as i 
abstract concepts with two 1 
is simply a super-complex co 
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An analysis of acquisition of abstract ideas was given by Hull (1920) 
Hull trained subjects on a series of lists cont 
items. Each stimulus term was 


response was a nonsense syll 


aining several paired-associate 
a form similar to a Chinese ideogram: each 
able. The items in successive lists were related; 
successive items that had the same response also had some component of 
the stimulus in common. It should be possible for a subject to use the similar- 
ity among stimuli to make the learning easier, generalizing the nonsense 
labels learned in earlier lists to the new stimuli on the basis of the shared 
components. This occurred, and Hull's experiment provides an illustration of 
how an abstract concept is acquired. The subjects learned to identify a 
stimulus category that they had not been familiar with, and the induction 
Occurred because, through generalization 
same label. А 
An analysis given by Underwood (1952) provides a more recent associa- 
tionist analysis of abstraction. Underwood's theory considered a task called 
verbal concept formation, in which stimuli are words that can be classified 
by acommon characteristic. For example, “barrel,” “doughnut,” and “moon 


all refer to round things and can be Classified on that basis. Underwood's 
theory is based on the fact th 
with the objects’ names. In th 
each word are 


à ; iv e 
„а set of stimuli all were given th 


ength and, because the basis 
ith all the members of its are 
more than other responses, an 


nough strength that the subject 
assifying the stimuli, 


n the stimuli before they can be 
used for classification, In Hull's 

the features of stimuli that are m the categories— perception of 
those features is assumed in the learni Ss. » in an important sense 
the associationist theory, re wing how abstract concepts are 
formed, explains how subj rlies they already know. Many 


» Seem to depend on c 
rather than on simple Perceptual attributes. To apply the associationist 
analysis to abstraction in cases where the defin У 


ing features include relations 
е relations Thus, we find i R 

i А ЭХ s, nd it problem 
atic whether the hypothesis of associationis Р 
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Abstract Ideas in Cognitive Theory 


Systems that form abstract ideas have been developed rather fully in 
domains where the abstraction consists of a single perceptual attribute or a 
set of attributes combined by conjunction and disjunction. Hunt’s Concept 
Learning Systems (Hunt, Marin, & Stone, 1966), cited in Chapter 2, are 
examples of one kind of a number of artificial-intelligence systems that induce 
categorical concepts that consist of combinations of defining features. 
Another form of categorical information is simple knowledge of objects that 
belong to categories. This kind of information is represented in semantic 
networks such as Quillian's (1968) model. It has been shown now, notably 
by Rosch (e.g., 1973), that both these forms of categorical knowledge are 
important in the way human knowledge is organized. A model that represents 
categorical knowledge as a network of relations can explain some facts about 
the answering of questions—for example, it takes less time to verify *a robin 
is a bird" than to verify “a robin is an animal" (Collins & Quillian, 1969). 


However, knowledge about the typical features of objects in a category also 
affects our performance in answering questions, às is seen by the fact that 
it takes less time to verify "a robin is a bird" than to verify “a penguin 15 
à bird" (Rips, Shoben, & Smith, 1973). 
Inducing relational patterns is also im 


An important contribution to the theory of p ; i 
Simon and Kotovsky (1963), who studied the process of solving series-extra- 


polation problems. This process apparently requires background knowledge 
about the normal order of symbols used in the task (the alphabet of letters 
or the sequence of numerals) as well as ability to identify rules for applying 


relations in a complex sequence. 
Another induction task was studied by Huesmann and Cheng (1973), who 
a subject to induce a mathematical 


presented sets of numbers and required я p 
on in this task seemed to depend 


formula that connected all the sets. Inducti ! 
strongly on the subjects prior knowledge of a well-specified set of relations 
s gly s 


(adding, subtracting, multiplying, and so on) and a systematic Strategy for 
generating and testing possible rules. There has also been some preliminary 
study of the principles needed to identify the Erarmmalcal rules that generate 
a body of sentences (Anderson, 1975: Hamburger & welen 1975). All these 
cases seem to require of the learner considerable background knowledge of 
a relational kind as à prerequisite for acquiring mewy abstract concepts and 
patterns. Although our objection to the associationist analysis of acquiring 
abstract ideas is informal, as we noted our doubts regarding the acquisition 
of complex ideas. we also consider it implausible that the mechanisms for 

1 concepts and relational patterns are generated from 


acquiring categorica с | 
simple connections among elementary sense impressions. 


portant in acquiring abstract ideas. 
attern induction was given by 
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CONCEPTS IN ASSOCIATIVE LEARNING 


Experimental analyses of categorical concepts by associationists have used 
items that can be classified into a few categories; the subjects’ success in the 
task depends on identifying those categories and using them to classify е 
items. When the task involves learning of associations, as in Hull's (1920) 
experiment, the categorical relations among items bring about positive 
transfer, in that previous learning has a facilitating effect on learning 11 
subsequent situations. 


Our view of positive transfer is an extension of our hypothesis about 
memorizing individual associations. We say that 
an association is finding a relational property tha 
Gestalt, of the pair of items. We propose that 
transfer of association is findin 


a major process in learning 
t makes a cognitive unit, OT 
a major process leading T 
£ relational properties involving groups «а 
associations, We Say, in other words, that transfer depends on association a 
a higher level, and we presume that the principles of relational meee 
that operate to form pairs are probably the same as the principles involve 
in forming the higher-order groupings involved in transfer. 

The simplest case of positive 
the same response. Let А 
that A-B has already been 1 
to A'-B if the subject notic 
two items together. When 


Other presentation © 

experiment, requiring: 
i, in which a distinct 
© treat the two similar 


appearance, but similar in meaning. A Subject may learn t “sex” to the 
D » о за ex to 
stimulus word "table. Then, when vE 


І a the word “chair” js presented, a basis 
exists for the subject to group the association “chair-gex” with “table-gex,” 


because “table” and “chair” are associated in the subject’s semantic memory- 
More indirectly, if “butter” and then “canary,” 


à : “are associated with the same 
response, subjects can group the two stimul; ause both name yellow things- 

The hypothesis we take regarding transfer, then, is that when 
pairs with the same response have sti can in some way be grouped, 


POSitive 
muli that 
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h those properties or meanings shared 


subjects can associate the response wit 
associations need not be learned and 


by the stimulus group. Consequently, 
remembered individually, but may be treated as a set. Tasks involving positive 
transfer will generally involve a mixture of learning events that occur at two 
or more levels. Individual associations will be learned; rules involving relations 
among associations will also be learned. Analysis of these situations involves 
identifying and describing the learning that occurs at the different levels. 
In associationist theory the concept of generalization explains positive 


transfer between associations. Again, the simplest case involves two associa- 
tions having the same response and physically similar stimuli. Once a response 


has been associated with one of the stimuli, if the other similar stimulus is 
Shown, a subject is likely to give the same response to this second stimulus 
as well. In associationist theory, there is some strength of association that 
generalizes to the new item, and the strength of the generalized association 
depends on the degree of similarity between the stimuli. In the system of 
Concepts used by associationist theory it is quite reasonable to postulate that 
Stimuli that look alike or e to stimulus generalization. 


sound alike will give ris! 
For example, the activity in the nervous system generated by a particular 
Pattern of sound waves becomes 


connected to a certain response. When 

another stimulus is presented, involving a very similar pattern of sound 

waves, the neural activity generated by this second stimulus probably has 

much in common with the neural activity that is associated with the response 

—so much so that the response is likely to occur when the new but similar 
pattern of sound waves energizes the subject’s ear drum. _ Е 

But what of generalization that occurs because of meanings · Recall that 

a major point of classical associationist theory was to show that complex 

“cognitive structures, including complex and abstract concepts, could be 

generated within a system whose only raw material was disorganized sensory 


experience. This gives strong motivation for postulating mechanisms that 
could produce basic cognitive achievements such as transfer of response 
based on meanings of st n recent associationist 


imuli. The mechanism used i 
theories is a process of mediation. ) 
When transfer occurs between words that are associated, the postulated 


mechanism involves forming an associative chain. Suppose a subject has 
* then a new item, “chair-gex, has to be learned. An 


» already is known by the subject, and this gives rise 


association, “chair-table, ; у ject 
to a chain of associations, “chair-table-gex.” By utilizing the association 
he subject only needs to learn to suppress the middle 


already established, t 3 nt PERERA 
item, “table,” rather ihan to learn a new association from chair” to “gex. 


What if the stimulus members of two associations are not as directly 
linked in the subject’s memory as are “table” and “chair,” but do have 
as do “butter” and “canary”? Mediation gives a possible 


similar meanings: 5 5 ДА, 
though less direct, basis for this case also. For associative transfer to occur 
А 
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Satay 
say, from “butter-mur™ to “canary-mur,” there must be gehen maed: 
linking from “canary” to “mur.” Providing some initial associative Y ker 
to make the learning easier, It could occur from “canary” to "yellow, 5 x 
then to “butter,” giving а four-item chain ee 
with two Mediating links. Or it could occur if in learning “butter-mur con 
associative Strength was implicitly given to "yellow-mur," Sonn E 
Strong associates of "butter" occur implicitly 4s responses whenever “bu 
is seen, 


; n- 
The cognitive and the associationist views of transfer have much in Ta 
transfer depends on shared properties of stimu n 
Shared associations in memory, However, they differ in conception hes 
important way. In the cognitive view, transfer occurs because of a t 
ability to Organize experience and depends on finding appropriate sae 
Properties for grouping associations into higher-level units, In the ee 
tionist view, transfer occurs because of generalized Strength of association, 


~ ME ransfer- 
implications about the process of ime 

ring a response to a new stimulus, The hypothesis of generalized eie 

Suggests that entirely new items will require learning from zero streng 


while items that are related to Previously learned items begin WA 
amount of Strength, depending Оп their similarity to already learned pem 
On the other hand, the Cogniti i Sts that transfer to new pete 
Should occur in an all-or-none f; Subject recognizes a relationshil 
between the new item and one or more items Previously learned, then the 
response should be known on the basi ! grouping, If à relationship !5 
not seen, then the new j 


à as no 
у Р Tent from an item that has ! 
relationship to Previously learned j 


The idea that transfer of 
was tested in an experiment b à (1966), The materials 
used involved Categories bas » in the sense mentioned 
earlier in this chapter, Table 5 Sts of materials learned in 
training and One of the lists given in tra 3 

Subjects in this experiment fj 


fashion 
ап all-or-none fashic 


s in this | rst memorized a list of seven associations: 
as shown in List | in the table, Subjects Were told that items having the same 


Tesponse could be related to one another, апа that these relationships would 
help in a later task. Subjects Studied the items in List | 


| ; апа then were given 
a brief test to assure that the Items had been memorized, Next the items in 
List 2 were presented, one at a time bject Was asked lovemonr ou 
each trial; the experimenter then ga 
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Table 5-1 Items Used in Transfer Experiment 


List 1 List 2 
Stimulus Response Stimulus Response 

Freckle Pel Atom Pel 
Earthworm Pel Sulphur* Pel 
Tweezer Pel Ivory Mur 
Grasshopper Pel Alley* Mur 
Paste Mur Globe Dix 
Sheep Mur Beak* Dix 
Knuckle Dix 


the category concepts 


*denotes control words, unrelated to 1 
Mur), and round (Dix). 


which in this list were small ( Pel), white ( 
Source; From Greeno & Scandura, 1966 


duce varying amounts of transfer was important because the two views about 
transfer give different expectations about the way in W hich transfer facilitates 
learning. According to the all-or-none idea, different lists of items may 
include different proportions of items that benefit from transfer but, for any 
single item, transfer either occurs completely or not at all. According to the 
idea that transfer involves some quantity of associative strength, each item 
benefits from transfer to some extent, and different sets of items will have 
different average amounts of transfer. The analysis of results allows a decision 
as to which kind of effect produced the differences 1n the amount of transfer 
that occurred. : 

The main difference in amount of transfer was between transfer items and 
control items, which were selected to involve little or no relationship with the 
categories used in List 1. A second variable, illustrated in Table 5-1, was the 
number of examples in List 1 for each category: of the three categories used, 
One was represented by four examples. one by two examples, and the third 
by a single example. A third variable used to produce different amounts of 
transfer is called the dominance of category examples. The materials were 
selected using measurements taken by Underwood and Richardson (1956). 
To obtain measurements. words were shown to subjects, - ho were asked to 
respond with associations in the category of sense Impressos: For example, 
an appropriate association for table” might be “fat” or square™ but the 
usual free association, “chair,” was not appropriate in this task because 

applying to tables. In selecting materials for 


"chair" is not a descriptive teu ; 
an experiment words for which subjects gave the same response are put 
, 


together іп а category: For example, "barrel," “doughnut,” and “тооп” are 
all words to which many subjects respond by saying “round.” They are then 
used together in a category as examples of the concept попа“ The domi 
inde df an example for a concept is defined as the percentage of subjects 
who gave the concept 1n Underwood and Richardson's association test. Thus 

^ 
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olve categories that are quite obvious; 
examples with low dominance involve Categories that are difficult to form. 
ted for this experiment with аа 
o and items of low dominance were selected wit 
association frequencies between 10°; and 20°,. The examples shown in 
nance in List 1 and high dominance in List 2. T 
To summarize the experimental design, each subject learned a transfer lis 
containing six items, three of which w 


ae nd 

n other words, both training dominance p 

transfer dominance Were varied in a 2 x 2 factorial design. There were ж 
Subjects in each of four groups. Of the three transfer items in each list, O 


: TUE en ohe 
at had four examples in the training list, на 
was ап example of a concept that had two examples in the training list, a! 


at had just one example in the training lis 


i es he 
he learning process, Figure 3-7 shows un 
nsition Probabilities. The discussion upt 
abilities of the States—in ordinary learning 


m's respo 
Probability of transition into State L i » С, On all trials. From Figure 
3-7, с, = с, = c, The remaining parameter is f, the Probability of losing an 
item from short-t an °F this analysis, assume that 
d, the Probability of transiting to State л while other items aie presented is 
zero. 


5 an issue about parameters 


1800 involves Which Parameter or parameters 
8 transfer. If tra i r-none 
i nsfer -or-nor 

ifference betwe m: er is an all-o 


ns will appear as difference 
d consist Only in different 
art. If transfer involved generalization 
the learning Probability ¢, or the short- 
should also be involved in transfer effects. 
У is that the learning Process would be 


: é Tence woul 
proportions of items known at the st: 


of associative Strength, then either 
term retention parameter f, or both, 
Of course, an important Possibilit 
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11-ог-попе model. However, 


more complicated than the one described in the a 
he question about transfer 


if it appears that learning is all-or-none, then t 
takes the precise form stated above. 


Goodness of Fit 

First, we must determine whether the learning process can be described 
adequately by the all-or-none model. The distributions used to test goodness 
of fit were the number of errors and the trial of last error. The exact expres- 
sions for these statistics are generalizations of Equations 3-1 and 3-2. The 
important new feature is seen in the probability of no errors: 


Ри, 0) = РТ = 0) =! + 1 – От aa aT (5-2) 


Note that the probability of no errors depends strongly on f, the probability 
of transfer, Other terms of the distributions have formulas that include effects 
of short-term retention as well as transfer, but have the same form as Equa- 


tions 3-1 and 3-2. 


реј бе k=! 
a = та 007+ fa) (5-3) 
=0=1 ое #22 
та = да =7+Ј 


1— pd — ad — а — ода — I 
=í md — yl 7 fe 


a —-ofü —9 | jai. 
'Ii-a0 —- 90 —f + fn. 

The comparison between theoretical and empirical proportions is given in 

Figures 5-1 and 5-2. ical distributions were obtained using 

maximum likelihood € of the parameters, which were Ё = .197, 

б = 207, fuss ,698. The fit appears to be satisfactory, and the conclusion 

seems to be warranted that learning was ар 


stimates 


Invariance of Parameters 
The question of all-or-none transfer can be answered by examining invari- 
dion: af parateters between groups having different amounts of transfer, as 
mentioned previously. To test the hypothesis using the greatest amount of 
information possible; all the transfer items were combined in one set of items 
and compared with the entire set of control items. Three tests are of main 
he transition parameters of the model, c and f. 


interest. First. consider the | i 
A likelihood ratio test for invariance of both these parameters has two 
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Figure 5-1 Theoretical and 

error from Greeno and Scan, 


0 2 4 6 


empirical distribu 


tions of trial of last 
dura's (1 966) experi 


ment. 
degrees of freedom; 
indicating Significance in the chi-squ 
invariance of ; between control and transfer items 8ave a highly significant 
test statistic: #'(1) = 59.60, p = nil. A thi st was for the hypothesis that 
1 was equal to zero in the control i othesis was acceptable in 
the data: %?(1) = 1.09, p ~ 20, TH vors the idea that transfer 
f generalization all located 


the test statistic was 1.92, which is well below the level 


are distribution, Second, a test for 


Another analysis was carried out, 
According to the hypothesis of all-or-n 
is one or more errors can be identified Or which transfer failed to 
occur. All such items should be alike, whether they were transfer or tontrol 
items. This can be examined by looking at Conditional distributions: that is, 
if items having zero errors are removed, the remaining items should give 
identical distributions, whether control or transfer items are considered. 
Figure 5-3 shows conditional distributions of the Number of errors, given at 
least one error, Note that in each panel, the distributions are Virtually identical 
for items having corresponding responses from List 1, 


showing the result in another мау. 


one transfer, апу item for which there 
as an item f, 
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Figure 5-2 Theoretical and empirical distributions of number of 
errors from Greeno and Scandura's (1966) experiment. 
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Fi ion: g " "Á 
ae aS ee k errors, starting with k = 1, on transfer and control items for 
Perfect items 


à ееп paired with one, two, or four examples during training 
Which the response ha^. : 
in Greeno ed Scandura s (1966) experiment 
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The evidence obtained by comparing transfer and control items gives 
quite strong Support for the idea of all-or-none transfer. In our view, the 
result favors the cognitive idea that transfer depended on a subject's fang 
in mind an appropriate relationship that permitted grouping of the new 
transfer item with the correct item or items from List 1. The idea of general- 
ized response Strength seems hard to justify, considering the apparent all-or- 
none character of transfer that was Observed. 


Theoretical Analysis of Values of t 


A final analysis was concerned with the varying amounts of transfer es 
Were related to the different degrees of dominance of the examples of concep 
Categories, and with the different numbers of examples used in List 1 for ei 
Concept categories, A Very simple model was used in this analysis, and while 

appear to be shown in the data, the 
Tesults were not sufficiently far from the data to give strong reasons for 


DES ; ; : ndividual 
modifying the model. The assumptions used include the view that individua 


items are learned in ап all-or-none fashion. When an individual item i$ 
learned, it is encoded in memory in some way. The assumption is that each 
possible encoding either is ог js Not related to the Other items in the concept 
a successful encoding for 


category—that is, there is a certain Probability that "e 
i € be the probability of learning 


TR : item 
achieving an encoding of the ite! 


Огу to allow good performance 0 
the item in the experiment. We assume th 


represent the property that is used as the basis of a conceptual category in the 


t a concept will be acquired 


t a takes two values: ay [01 
for low-dominance items. Let k be the 


ng list. Then the probability 


of acquiring the concept when List | was memorized Would be 
An =1—(i ~ а), 


where i= Н or j — р and k is the nu 


where j — Horj = І. 
Parameters were estimated and Equation 5 4 was Used to obtain valueseet 
t; these were in turn used in Equation 5-2 to i 
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Table 5-2 Theoretical Values of t and Predicted and Obtained Frequencies of Zero 


Errors 
Condition 

$e ining Test No. of Predicted Obtained 

ominance dominance examples t frequency frequency 
High High 4 .858 21.05 17 
High High 2 .624 16.15 21 
High High 1 387 11.20 10 
High Low 4 .566 14.95 14 
High Low 2 412 11.72 9 
High Low 1 .255 8.45 12 
Low High 4 4m 12.97 15 
Low High 2 273 8.83 7 
Low High 1 147 6.21 2 
Low Low 4 311 9.63 13 
Low Low 2 .180 6.89 6 
Low Low 1 .097 5.16 2 

1966. 


Source: From Greeno and Scandura, 
riment. The results are 


r conditions in the expe 
from the estimates 


No errors in the 12 transfe i 
dicted frequencies came 


shown in Table 5-2. The pre 
387, а = -14, 


ан = 
b, = .600. 


by — 1.000, 
icion of some of the assumptions. 


The results show some reasons for susp! 120) 
With low training dominance, relatively large overpredictions were made for 


the cases having only one training example, and this causes suspicion that 


acquisition was probably facilitated by the simultaneous presence of more 
ever, with only 24 cases per cell, the predictions were 


than one example. How 

not ми discrepant from the data by a chi-square test (2(7) = 13:75, 
р > 05). 

GROUPING 


ACQUISITION OF RULES FOR 


as quite strong on the issue of how 


transfer occurred—that is, it was all-or-none. However, strong evidence was 


not obtained about how the lea ning that provided the basis for transfer 
a : 
lyze different amounts of transfer was 


Occurred, The hypothesi А 
that properties correspon i e conceptual categories were or were not 
5 ne basis. However, this was tested only weakly in 


Greeno and Scandura's evidence W 


encoded, on an all-or-non® | : 
Greeno and Scandura's experiment. We now turn to an experiment by Polson 
cess of acquiring conceptual categories was studied 


(1972), in which the prO 
in detail. 
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We assume it is because the subject aan te 
Property or rule that allows the transfer item to be grouped with the ci 
Process by which relational groupings are pee 
has been discussed in some detail by many Gestalt psychologists, notal 3 
Katona (1940), Köhler (1927), and Wertheimer (1959). The idea of Pana 
through insightful discovery (Yerkes, 1927) and the acquisition of relations 


a of learning theory (e.g., Hilgard & Bower, ee 
ved subjects memorizing a ere 
9 groups had items that ep Ж 
related оп the basis of conceptual categories. The five lists given to the ot t 
he concept-acquisition енси 
» had categories based on superordinate concepts. d 

categories were kinds of furniture (chair, table, and so on), parts of the Бо а 
insects, fruits, articles of clothing, geographical features, means orate 
tion, and animals, The examples used were taken from normative po 
obtained by Cohen, Bousfield, and Whitmarsh (1957), in which subjects Qe 
given the names of categories and wrote down examples of the mu 
The other concept acquisition condition, called CA, Sense, had categorie 
based on concepts involving Sense impressions (Underwood & Richardson 
1936). The categories Were “soft,” “red,” “round,” “dark,” “white,” “small, 


“smelly,” and “green,” The measured dominance of the examples used iN 
the CA, Sense condition was above 50%, 

Each list in the CA conditions included one ex 
each word paired with one of the numbers l- 


category were all paire 


of five lists. The five lists given to the first tw 


-v. with 
ample of each category, WI f 
~ па E 
8. The five members of а 
Tal response in the five lists- 


ight be number l, all the parts of the body 


Different Subjects had 
responses to the Categories.) The lists pi 


TEE given to the control groups had words 
that were phonemically simil 
lists. For example, in the contr ; the word "plume? was 
used in place of "plum," a member of the Category “fruit.” Carew as taken to 
balance the conditions in word length and Word frequency of the items used. 

Each subject was given the items of ist i 

method. The items of each list were 
correct responses for three Successive 
Subjects were run in each of the four e 


€ subject gave only 
Presentations of the list, Twenty-four 
Xperimenta] Conditions. 


All-or- None Learning and Transfer 


The main question in this experiment is the Process by which the categorical 
groupings are acquired. However, this question js Precise only in relation to 


a definite model of the process of learning ang transfer, The simplest case for 
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analysis has both learning and transfer that are all-or-none, and Polson tested 


these ideas in his experiment. 

The hypothesis of all-or-none learning was tested using the predicted dis- 
tributions of errors and trial of last error (Equations 5-2 and 5-3) and other 
theorems that are implied by the Markov model of all-or-none learning with 
short-term retention. The agreement of the data with the model was quite 
good; satisfactory agreement was found with distributions of the number of 
errors, the trial of last error, and some sequential statistics. There was 
evidence that performance prior to learning was not stationary, as the model 
Stipulates, but this was largely confined to the first list learned in the experi- 
Ment. However, the extent of the discrepancy in Polson’s results did not 


Seem sufficiently large to justify analysis using а more complex model, 
especially since the main questions of the investigation involved the later 
lists, where transfer was possible. me 

The hypothesis of all-or-none transfer was examined by estimating the 
values of parameters in the model of all-or-none learning. Figure 5-4 shows 
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Figure 5-6 Estimated prelearning performance in Polson's (1972) 


experiment. 


difference between conditions in the learning parameters for the various 
lists. The estimates of c increased across lists, indicating that subjects became 
more skillful in memorizing items as the experiment proceeded. However, 
this generalized transfer was just as great in the control conditions as it was 
in the concept-acquisition conditions. This result corresponds to the finding 
of Greeno and Scandura: that items in a transfer condition on which the 
Subject did not perform perfectly did not differ from comparable control 
Items, : NT : 

As Figure 5-6 shows; there was an increased probability of holding unlearn- 
ed items in short-term memory in the concept-acquisition conditions relative 
to the control conditions. This means that there was a transfer effect influenc- 
ing performance prior to learning, in addition to the transfer of association 


that is measured by the transfer parameter fe 
We do not have 4 satisfactory explanation of this transfer effect on perfor- 


mance before learning. One possibility is that short-term retention is facili- 
tated if the list contains а number of items that have already been learned. 
lf; say one-half the items were known by a subject on the basis of transfer, 
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items. 
aining only four а colle 
i | агпе 
ect with a control list would have eight unlea oO й 
is reasonable to Suppose that presentation for 


ve 
fan unlearned item, and Calfee and Atkinson Seen 
nalyzing effects of list length on learning parame d lists 
is argument would lead to a prediction about dpe lst, 
ained by Polson (1967) In a rat e» 
Some transfer ang 50те control items are included in the san.e list, ie scluded 
case in Greeno and Scandura’s (1966) experiment. Polson (1267) ference 
mixed-list Conditions in his Original Study, and obtained a similar di ; 


yn in 
3 shown 
Г between transfer and control items to the one 

Figure 5-6, 


Of course, an altern. 
а transfer of associati 
been able to think of 
produce a decrease in 
We conclude that 
reasonably well in Р, 


` indicates 
ative explanation of the transfer effect is that it enn 
ve strength. We think it does not, because we ha 
а way in which transferred ; ase in € 
f without at the same time producing an increas orte 
the hypothesis of all-or-none transfer was a git 
olson's (1972) Tesults. The outcome is not in data 
Supportive as was Greeno ang Scandura's (1966), in that Polson rre 
showed a transfer effect 9n performance prior to learning on untransfe 2 
items. However, that effect was relatively small—a difference of .20 to diss 
in the performance Pared to effects of .50 to .80 in the tra 


Web cou 
associative strength 


fer Parameter, And ce 
untransferreg items а re Was по evidence of any differen 
between Control and transfer ; i 


sence 0 

Occurred, the sagan We 
S а more Cogent datum, “abie 
: sfer represents the best availa 
general conclusion (0 be taken from available results, 
Theory: of Acquisition 

The theoretical analysis of 
Scandura (1966) was tried als 
from the data in an import; 
the successful analysis carri 


simple hypothesis, there js 


"1 given by Greeno ins 

Was close, but differec 
- We Present the simple theory here, for 
Of it. Recall that in the 
а that the concept-category !5 


a 
assumed that when 

i > there is а Probability 5 that the 
relationship is recognized z 


i i i Р . __ SOn added the assump- 
tion that if a transfer item 15 not recognized, the Concept will be reacquired 
with probability a when the item is learned, 
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ategory in each list. The ques- 
of the value of г in each list. 
above imply the 


There was only one member of each concept-c 
tion is whether the theory can give an account 
Let t, be the value of t in the і" list. The assumptions given 
following: 

According to this theory, each item in 
the item is not known at all; State S, where the item is in short-term memory; 


State L, where the response for the item has been learned, but where the 


concept-category is not known: and State А, where the concept-category has 
j^ list, an item is in State 4 with 


been acquired. At the beginning of the 
Probability r,, otherwise it is in State U. The probabilities of transition among 
the states are 


a list has four states: State U, where 


A E E) U 
A 1 0 0 0 
P=L| 0 1 0 0 (5-5) 
5 | са 91-а) (!– да — f) (l—o)f 
U | са са —а) (1– о —/) (1—0f . 


Each list was presented until the subject learned all the items; thus, each item 
is absorbed either in State A or State Lat the end of each list. The probability 


of absorbing in State A at the end of the i^ list is 


P(A) = и + (1 10а. 


nsfer is the probability of knowing the 


Recall that the probability of tra > 
ew example. That is, 


Concept, times b, the probability of recognizing a n 
ta = P(ADb = n 0 даь = ab + 11 аЬ. (5-6) 


is zero, since the subject cannot transfer until at 


The initial value of f, ti l r : 
о acquire the concept. For List 2, the 


least one chance has been given to 2 
Probability of transfer is t2 = ab, and in general, 


/ 
ко 


1, =a? aby^] i (5-7) 


Equation 5-7 can be proved by induction using the recursion given in 5-6. 
-7 were estimated and used to predict the 


The parameters of Equation 5 
para ing Equation 5-2. Estimates for the 


Proportions of items with zero errors USIT paun l ' 
Superordinate concepts were 4 = -488, b — .935; for the sense-impression 


concepts, 4 = 343, Ё = .632. The values of г, obtained when these parame- 
ters were used to predict the proportions of items with zero errors, using 
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Would occur if the pro- 
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shin А concepts had two stages, and Polson developed a two-stage 
Ghia at agreed with the results satisfactorily. In the two-stage model, 
Hessian 2 assumed to begin ina state where they do not know what kind 
the tpe n to look for in the materials. In this state of uncertainty about 
i regi of groupings to look for, the probability of acquiring a concept is 
A ely low. However, when transfer occurs for a concept, the subject is 

sumed to transit to а state in which she or he knows the general kind of 
relationship that is involved in the groupings, and the probability of acquiring 


Concepts increases. 
Doors theory is conceptually straightforward, but it involves à consider- 
le technical development, involving analysis of the state of a subject, as 


ves as the states of the individual items being studied. The subject has two 
NE So and S,, and each item has States 4, L, S and U, as before. When 
> subject is in State So, the transitions among states for an item have prob- 
abilities given in Equation 5-5: with a = ао. When the subject transits to 
81, the transition probabilities for items change, being the probabilities in 


Equation 5-5 but with a = 4: 
As Polson formulated the theory, 2 subject W 

unless no concept categories Were known. This 

states are sufficient to characterize subjects: 

t3 Syn 


S, and transfer occurs on n of 
ories. Let q,, м, be the 
where the probability 


ould not stay in State Si 
mplies that the following 


So» Sio S, 


subject is in State 
of concept-categ 


ndependent trials, 


Where S, „ means that the 
the items, and M is the total number 


binomial probability of 7 events in Ni 


of the event is p; that is, 
_ {NY a ру". 
= i: )p (1— p) 


the probability of state k on 


Qr NP 
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ons about the performance given by a subject 
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strong and interesting 


P ihe: idea was based 
ndividual items. Polson’s test of the idea ји M 
i items a su 
on analysis of the random variable Xa the number of items for a 
the i^ list that had no errors, Let z, be defined as 


à 5с, 
= Әле) 


t 
; " 5 ва call tha 
the probability of zero errors for an item that did not transfer. (Кеса 


d 
T 3 So) an 
the parameters ¢ and f were not constant across lists.) Define P,(So ; 


bes e probabil- 
PS, m) as the Probability of state So or S, „оп the i^ list. Then the pr 
ity distribution of the variable X, is 


> 


" (5-9) 
P(X, x) PASDA ом. = P, дим =һ, 


he 
maximum-likelihood estimates of the parameters n 
model, using as data the empirical distributions of Xy i= 2, E esti- 
g in state Sio was allowed in the analysis, but the 
mated Probability of this event was Zero. Thus, tl 


ode! 
ле parameters of the m 
of acquiring the concepts 


: Зе s were 
are аб, а,, and b. The estimated values v 
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à ons that have been learned previously ant 
ring the transfer task. A more complicated situation 
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Figure 5-10 Proportions of errorless items obtained, and predicted 
by the two-stage theory. 
bout the interaction between 
he assumption that interaction 
on the subject's acquiring a rule involving use of 
ared by the stimuli. This rule-acquisition model 
ed by Greeno and Scandura and by Polson in 
ther model Batchelder considered is based on 
sampling theory (Atkinson & Estes, 1963). 
ocesses involving association to stimulus 
ponents of stimuli, it is called the mixed 
model. According to this model, learning of one item does not affect learning 
of the other item; however when one of the items 15 learned there is an 
increased probability of the correct response on tests of the other item. 

A strong feature of Batchelder's analysis was use of the pair of related 
items as the unit of analysis. Each state of the system specifies the condition 
of a pair of items: thus, the states are U, U: U Bid. U; E Geand AU 
denotes an unlearned item, L denotes an item that has been learned, and 
A denotes the state where 


d two assumptions а 


Batchelder considere 
s. One model uses t 


learning of two related item 
between the items depends 
à property or properties sh 
Uses the same idea as was US 
accounting for transfer. The o 
ideas developed in stimulus 

Because it assumes a mixture of pr 
patterns and transfer based on com 


the subject has acquired a rule relating the two 
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items. A change in state can occur w hen either of the two items is presented. 
Thus, the learning theory is specified as 
transition probabilities when the first it 
transition prob 
rule-le 


à pair of operators, one giving the 
em is presented, the other giving 
abilities when the other item is presented. According to the 
arning theory, the transition probabilities are 
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It is assumed that on a test, if an item is in State U, the subject will guess, and 
be correct with probability g, If an item is in State L, the subject will give ie 


correct response on a test. And if the Pair of items is in State A, the subject 
gives the correct response to either item, 


The pattern-components mixed model has the transition operators 
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In this model, if the pair is in State U, U, the subject guesses on either item 


that is tested. If the pair is in State U, L or L, U, the subject will be correct 
on a test of the item in State L, and on the item in State U the probability of 
Correct responding is b + (1 — 4)g, giving the transfer effect on performance. 

The two models were used to analyze the results of five experiments. In all 
the experiments, stimuli were nonsense forms originally constructed by 
Gibson (1940). Responses were English words on which subjects were given 
Pretraining, and a card showing all the responses Was available to the subject 


throughout the experiment. The lists contained pairs of associations having 
Similar figures as stimuli, associated with the same response. Four of the 
edure, with the total list divided into 


experiments used the anticipation proc 

two sublists, each containing one of the two related members of each pair of 
Associations. In each cycle of presentations of the list, all the items in the 
first sublist were presented, followed by all the members of the second sublist, 
With the orders of items within sublists randomized on different presentations. 
The fifth experiment also divided the items into sublists, but gave all the items 
for study, followed by tests on all the items, then another study cycle, and 


SO on. 


The results of the experiments are given in Tables 5-3 and 5-4. Estimated 


values of ¢ varied from .22 to .46 in the five experiments. The values of b for 


the mixed model ranged from .63 to 51. Values of a, in the rule-learning 
model were estimated between „53 and .77. Batchelder found that the assump- 
Table 5-3 Predicted and Observed Values of the Mean and Variance of the Total Error 


Distributions 


Experi- | = = X 
meni ET) VT) E(T) VT) E(T) VT) 
[o D ЕБИНЕ ЕА = 79 :33 21 
Г Mixed — a e 75 35 32 
ule pr Е 2 7 
ate 35 т 72 75 38 33 
it à 1.47 = 1.66 .66 .70 
Mixed = 172 = 1.75 71 74 
Rule = 179 124 1.96 2B 90 
Data 151 j 98 53 
n abi D 91 = .98 PES. .54 
Міхей = ‘97 == 1.09 .54 .54 
Rule he 109 86 98 58 л0 
Data Бе 341 = 3.26 1.08 1.65 
IV Mixed = 2.27 — 3.30 1.08 1.65 
Rule 2.09 2.34 1.63 1.96 1.05 1.17 
Data = 2 
à M 77 :52 TI == 40 
v me E 181 :52 81 — 40 
ule ‘66 52 70 3 
Ба 52 66 2 30 38 
Ider, 1971 


Source: From Batche 
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Table 5-4 Pred 


E Last Error 
icted and Observed Values of the Mean and Variance of the 
Trial Distribution 


Experi- 2 = ya) 

Tet E(L) VL) ЕШ) VL) — ED 5 

I Mixed 1:36 1.88 1.03 1.98 41 58 

Rule 116 1.34 87 1.26 3 5 

Data 1.15 1.29 87 1.25 44 137 

П Mixed 202g ty 1.81 4.40 BRO X60 

Rule 1.82 2.80 1.49 2.79 92 1.30 

Data 1.80 3.16 1.56 3.59 85 94 

Ш Mixed 1.52 2.20 1.18 2.39 .63 96 

Rule 141 1.47 ‘97 1.43 66 1.05 

Data 1.41 1.79 1.00 1.55 67 397 

ІУ Mixed 3.35 5.68 3.08 7.52 1.55 3,97 

Rule 2.51 5.43 2.08 5.39 1.55 1.86 

Data 2:57 5.80 2.04 3.1 1.50 n 

У Mixed 75 1.86 75 1.86 35 6l 

Rule .58 1.07 .58 1.07 35 44 
Data .63 1.20 63 1.04 35 


Source: From Batchelder, 197] 
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rul Р Е 5 ^ i 
e learning. However, the overall picture shows fairly good agreement with 


the data for the predictions of both models. 


SUMMARY AND CONCLUSION 


We have considerable skepticism regarding the success of associationism 


In explaining the acquisition of abstract ideas. It appears that associationist 
analyses have explained only identification of concepts that were known 
Previously. Thus, emergence of an abstract category seems to require prior 
knowledge of the property or relation used to identify members of the cate- 
gory, and induction of a relational pattern requires prior knowledge of the 
relations and general structural features of the pattern. 

Apart from these general difficulties, our results raise serious doubts about 
the validity of associationist explanations of phenomena that occur in the 
Specific kinds of transfer experiments that have been designed to illustrate 
associative mechanisms in concept formation. We have described three 
investigations involving positive transfer of association. All of these have been 
based on the idea that learning and transfer involve discrete events In à 
Subject's state of knowledge: transfer in these analyses consists simply of an 
item’s being known because another item had been learned and the two items 
Could be grouped according to some shared property or meaning. The results 


do not constitute a hard refutation of associationist ideas; concepts of a 
ponse strength undoubtedly could be 


threshold and of variability in initial res 1 i 1 
developed to make association theory compatible with these findings, just 
às they were in the case of all-or-none results regarding memorizing of indi- 
Vidual items (Restle, 1965). On the other hand, these analyses treating transfer 
às an all-or-none event based on success in finding relational properties fit 
Naturally in the cognitive framework, and thus support the plausibility of 


that view. 
An important (ес! 
methods for analyzing learning that ос 


Measuring the parameters of that learning. i al p! | 
the acquisition of knowledge surely involves investigation and analysis of 


Complex structures, including structures of rules and individual associations 


as well as more intricate relational patterns. The successful application of 
„stems involving as many as three levels of 


rigorous analytical methods to SY ~ 
learning, as in Polson’s (1972) analysis, argues well for the continued develop- 
ment of methods for testing definite and productive hypotheses about signifi- 


cant learning processes. 


hese analyses is the development of 
curs at more than one level and for 
The general problem of analyzing 


hnical feature of t 


chapter 6 


Suppose a subject h 
Now a new list is Presented; stimuli 
(C-D), or stimuli may be the same 
stimuli may be paired with the old r 
Tesponses may be used, with the Bt 
(A=B); A-C is a harder trg 
C-B and А-В, to C-D дер 
аге hard (for ex 


У To ist A-B. 
as learned a list of associations—denote the list 


and responses may be all new jd 
and the responses new (A-C), Sr : 

esponses (C-B), or the old stimuli we 
anged to create new ie 
HD: The relative difficulty = 
responses used. If responses 


ay E antage 
ables). then It is an obvious eec 
to be able to use the same responses in the transfer list as in the initial tra 
ing list. But A-B, is always harder to learn th in C-B 


about negative tr. 
cognitive theory of association а 
in Chapter 4, Recall that in thist 
tion can be divided into two Ages: storing à representation 
of the pair of elements to be associated. and learning to retrieve the pair 
reliably when the stimulus is Presented on 4 test, Negative transfer could be 
expected in either of these Stages, = 

Consider the Process of Storing Fepresentatj 
in the transfer list. First, note that when resp, 
the subjects, if responses from the fi 


" " ; of the 
ansfer can be interpreted in terms of tl 


veloped in Chapter 2 ang specified further 
heory, the process of lea 


. S cia- 
: arning a new associé 
Main Component St 


Опѕ of the Pairs to be learned 
onses are not meaningful for 


rst list are also used in the second list. 
124 
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there should be an advantage in the first stage of transfer learning. The reason 
for this was mentioned in Chapter 4, where a cognitive theory of response 
learning was sketched. That is, storage of a stimulus-response pair will be 
harder if the response must be stored as à string than it will be if the 
Tesponse can be stored as a single element. because it is easier to build a 
New associative structure with a unitary response than it is to build a new 
associative structure with a response that must be integrated. Therefore, when 
nonsense syllables are used as responses. there should be an advantage in the 
first stage of transfer learning for С-В and А-В, conditions, relative to C-D 
and A-C, respectively. 

The negative transfer that can occur in the first stage of transfer is due to 
the presence of the same stimuli as were used in List 1. Recall that the stored 
representation is a structure involving a relation between the stimulus and 
Tesponse, suggesting a bias toward encoding features of the stimulus that are 
Televant to a relationship with the response. Evidence of such bias has been 
Obtained in experiments (Ellis & Shumate, 1973; Voss, 1972; Weaver, 1969). 
In A-C or A-B, transfer, the stimulus term in each transfer pair has previously 
been encoded as part of a different association. This means that the stimulus 
term has features represented in memory that were selected in the context of 
Storing and retrieving the initial A-B association. 

When the subject has to store a representation of the transfer pair, three 
Possibilities may occur. First. the features used to represent the stimulus in 


the A-B association may fit easily into a relation involving the new response, 
de the new association representing the 


SO that the subject may simply enco fuss 
Stimulus in а sd as it was represented in List l. Although that event 
would probably cause some difficulty in retrieval, there is no reason hia expect 
a disadvantage in storage of the association. In fact, the opportunity to use 
Е A that had been developed previously might conceiv- 
f new associations in some circumstances. 

is that the features initially used to represent the 
into a relational structure. but the subject ignores 
d recodes the stimulus using new features. It is 
ations developed earlier 


à stimulus representatio 
ably facilitate storage o 

A second possibility 
stimulus do not fit easily 


the initial representation an des ia 
not obvious that subjects could easily ignore represent: і а 
for stimulus terms "unless they had some systematic basis for classifying 


However. if representations can easily be found involv- 
ing new stimulus features. then the storage of 3 representation could occur as 
easily in A-C or in A-B, transfer as in the С р control condition. 

The third possibility is that there would be interference in the storage phase 
of transfer learning. This final case would inv olve an initial representation of 
the stimulus that does not easily form a relationship № ith the second-list 
response, and a tendency by the subject to use the initial stimulus representa- 
tion in storing the new association. We have called a tendency for subjects to 
continue using old representations, persistence in encoding (Greeno. James, & 


Stimulus components. 


126 Negative Transfer 


DaPolito, 1971). There is some 
persisit wit 
difficulty i 
thorough 


ice versa: 
€ response than vice 


" w sup- 
a we will present later, we no 


and must be integra 


: k 1 4 din 
In the second Stage, prior learning that involves the same stimuli use 
transfer may either facilitate or j i 


attend to distinctive features of the stimuli, see Ellis & Muller, 1964.) 
We suppose that the organizational aspe 
usually be more difficult if the Stimuli have 


stimuli that are helpful in retrieving the resp 
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m in retrieving pairs after the responses have been changed. 
vid у. it should be possible to arrange pairings that would make use of 
e D $ паа ап ехатріе might be a first list with the responses A, B, 
Acces Б, and G and a second list with the responses RA 3,4, 5, 6, апа 7 
'gned to the stimuli in the same order. However, in the conditions generally 
used in transfer experiments we suppose organization acquired in the first 
list usually makes it harder to acquire an efficient retrieval system for the 
second list. 
ire consider effects in Stage 2 produced 
sed in the first list. The contrast is be 
are used, and A-B,, where the List-I respons 
Stimuli. We should expect that interference du 
in the А-В, situation. For each item, а new stimulus-response unit has been 
formed that must be fit into the retrieval system. If different features of the 
stimulus are involved in the second encoding from the first one, and if the 
New features overlap with those used to encode other items, then some 
rearrangement will be needed for successful retrieval of List-2 items. In 
addition, relationships between different responses used in A-B, may have 
induced some groupings of items in List 1 that interfere with retrieval in the 
A-B, list. Two similar responses might be paired with stimuli having some 
Shared feature, and this feature would most likely be included in the retrieval 
Network for the items. But the response ed to those similar stimuli 
after re-pairing might be quite dissimilar, ! ng that was 
Useful in List 1 could lead to confusion in organizing А 6 
Ву comparison, the situation in A-C could be relatively less difficult in the 
Second stage. The subject has two alternative strategies available for building 
à new retrieval system, and both of them should be easier 1o carry out if new 
responses are used. One strategy for the subject is simply to set aside the 
Tetrieval system used in List 1 more or less intact. This would seem especially 
likely in a situation involving compound stimuli with easily classified. com- 
Ponents—for example, colors and words. Subjects could choose one kind of 
component in List 1 and shift to the other in List 2. This would make the 
lask of building a new retrieval system essentially independent of the exist- 
ence of the network acquired for List J : А 
А second strategy available for acquiring а retrieval network for List 2 
is to keep the List-1 network active, modifying it as needed to allow retrieval 
of the items in List 2- Subjects could incorporate new features not used in 
retrieval for List 1, oF could modify the structure so that new items would be 
retrieved on the basis of the same feature tests. 
The hypothesis that subjects set aside the List-1 retrieval system in A-C 
Postman's (1963a) and Underwood and Schulz’ (1960) 


learning is similar to Е Ч 
mentioned in Chapter 4. Conceptually, the 


idea of response-set selection, 
ideas differ because the hypothesis of a new retrieval system involves a 


by having the same responses that 
tween A-C, where new responses 
es are re-paired with the 1151-1 
ring Stage 2 should be strongest 


s assign 
and the interpair group! 
List 2. 
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у the avail- 
recoding of the stimuli. While response-set selection involves а 
ability or responses. However, both ideas would explain less rfe aad List 
transfer in A-C on the basis of greater independence between Li: d roposed 
2. In our earlier theorizing (Greeno, James, & DaPolito, 1971 )s ba Empirical 
the hypothesis of an independent retrieval system to explain Аан stage: 
finding of less negative transfer in A-C than іп А-В, in the pi because 
An independent retrieval system would be easier to construct in 
а new set of responses are involved. -h more com- 
Further thought has led us to believe that it is probably muc psa transfer 
mon for subjects to adopt the second strategy for dealing with list retriev4 
list—that is, we now believe that subjects generally keep the first- pr use 
System active and modify it to retrieve the List-2 items. For one = 


«o which 
ad ar encoding. V 
of the second Strategy conforms to the principle of persistent encc 


fer in the first 
15 necessary in our framework for explaining negative transfe 
Stage of List-2 | i 


be sgin ап 
earning. Another consideration is empirical; bonis that 
nd Schneider and Houston (1969) obtained evidet 
used the same com 
List 2 as they had in List 1. 


If subjects modify 
what implications are 


E nei 
f Е x items 
ponents of stimuli to encode 


it aside: 

rather than set 3 par^ 
V rA Can 

there regarding the relative difficulty of A-C ar 


sas that 
S es tha 
f persistent encoding impli 


" than the other sai 
2. It would not be surpris 
n retrieval using most of the same featur é 
the list and when the new response is 01 

Situation, Second, whatever groupings 9. 
ause of stimulus везане 

are features would probab : 
аге re-paired: therefore th 

%0 as to avoid use of those 
ation of 4 retrieval network 


ith responses that sh 
produce confusions in List 2 after the items 


retrieval network would probably be modified 
stimulus features. We would expect the modific. 


Negative Transfer 129 


ae Mami. to involve a relatively greater number of features at 
arias & g а in the network, and would expect a tendency in A-C 
of the n = ify the network at lower levels. closer to the representations 
казна Y ual pairs. The changes required to support retention of A-B, 
ву oo to create more difficulty in second stage learning, since 
fis ee Ма more extensive recoding of stimuli including greater change in 

ganization of the list than would be produced by relatively lower-level 


c Mane 5 
hanges that we suppose are typical in А-С learning. 


ASSOCIATIONIST THEORY OF NEGATIVE TRANSFER 


e transfer is explained by interference 


In the associationist theory, negativ 
ations. When A and B terms have 


wi 5 SE ; 
5 th new learning caused by existing associ 
сеп associated in the first list, it is assumed that three kinds of connections 


have been formed. First, there are forward associations, from A to B. Second, 
there are backward associations, from B to A. although it is expected that 
these are ordinarily weaker than the forward associations. And finally, there 
àre contextual associations, from the general stimulus situations to the B 
Tesponses, In addition to these associations, the subject has acquired a 
Stable encoding of the A stimulus and has integrated the B response. 

The application of these ideas to analysis of transfer was given by Martin 
(1965). When responses in the two lists are the same, the process of response 
integration need not be carried out in learning the second list, and contextual 
associations that result in response pool formation will already be in place. 
Thus, in C-B and А-В, the occurrence of the first-list responses gives a source 
Of positive transfer. and this factor has more importance if responses are 
difficult, On the other hand, backward associations learned in the A-B task 
Provide a source of negative transfer in C-B and A-B,. Whether there is 
Negative or positive transfer in C-B and A-B, compared to C-D depends 
оп the relative importance of response learning and backward associations. 
When meaningful responses are used. it is reasonable to expect that backward 
associations could be more important, and produce negative transfer. 

When stimuli from the first list are present in transfer in A-C and A-B,. 


forward associations from the first list will interfere with second-list learning. 
cilitating effects aS well. Battig (1966) has suggested that 


terference within the initial learning task. retention and 


transfer may be enhanced. One possible mechanism consistent with Battig's 
hypothesis is stimulus differentiation. With very similar stimuli in the first 
list, the subject will learn to discriminate the stimuli and this learning could 
be helpful in the А-С and A-B, transfer tasks. 

In a recent discussion. Martin (1972) has given a careful analysis of implica- 
tions of the hypothesis of stimulus-encoding variability concerning transfer 
In a number of his conclusions, Martin has sided with the kind of оба 


There are possible fa 
When there is strong in 
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3 а- 
theory that this book is Presenting. However, Martin also discusses am 
nisms that are Consistent with the form of encoding variability hypot : that 
earlier by associationist theorists (see Martin, 1968). Martin has ipse es 
if the subject uses the same encodings of stimuli in A-C and A-B, as delist 
ini associations will be elicited and interfere with ae list 
learning. But if Subjects can encode the stimuli differently for the el 
than they did initially then associations in the second list should be тиі 
1972) reported an experiment conducted by | sip 
involving compound stimuli, with a test for backward associations мав 
ms following learning of the transfer task. m d the 
us Components were the same for both the B ar 


just as rapidly as in the C-D control condition. 


to 

; as 

uli used are pei dnd д 
. x 1 " 

T should be relatively less severe than if enco 


: veen 
T 15 caused by interference a we 
the stage of learning in ied 
ted. On the other hand, it is not 


make new learni 


assumptions are needed to explain h 


interfering effect. 


fae 
n € strongest P 
а > e given by the subject. 

not assumed that a stimulus 1S Connected to only one Bene but rather 
that there isa habit-family hierarchy (Hull, 1952) based on differing response 
strength, and the strongest Tesponse is the One that Occurs. If this assumption 
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jp aid i ad associative theory, then the cause of negative transfer is clear; 
fails Valais learning, the correct response starts lower in the habit- 
Witkaut a y for its stimulus than it would without the previous learning. 
Miser e initial learning, association to the correct response has to be 
heb gthened until it exceeds the strength of other associations that the subject 
pened to have for the stimulus when the experiment began. The complete 
process involves increasing the strength of the correct association and decreas- 
Ing the strength of other associations whose responses occur but are not 
reinforced. If there has been previous training for a response that is not now 
Correct, then the old response starts with much higher associative strength 
than the correct association. During practice. the old association will be 
Weakened, as will any other incorrect associations that manage to recover 
Strength or that happen to arise in the new context. And the new correct 
association will, of course, be gaining in strength as à result of the pairings 
provided in the experiment. Eventually, the new correct association will rise 
to the top of the habit-family hierarchy where it will dominate performance, 
as it must for the subject to achieve the experimenter's criterion of correct 
Performance. (We have discussed only the interfering effects of forward 
associations that occur in A-C and A-B. transfer. A similar story could 
be developed regarding backward and contextual associations, although it 
Would be somewhat more complicated for backward associations because 
their influence on forward stimulus-response performance must be indirect.) 
According to the hypothesis of learning as acquiring the greatest strength 
for the correct response, the process of weakening the initial A-B associations 
Will occur mainly during the early stages of second-list learning in A-C and 
A-B,, since at these stages the first-list associations are still stronger. It would 
Not be necessary or even very natural to assume that weakening of the A-B 
association must occur before the A-C or A-B, association could start to 
be strengthened. On the other hand, some trials would occur in which A-B 
had sufficiently greater strength than the new association so that the correct 
response would have virtually zero probability of occurrence. Only when the 
new association was strengthened and the old association weakened to the 
point where their strengths were in the 


same neighborhood would the new 
correct response be exp e substantial nonzero probability. 
(Response strengths are assumed to vary; thus, the term, the same neighbor- 
hood, refers to à situation in which the distributions of strength for the two 
associations have 4 substantial amount of overlap. In this case, the correct 
response would be stronger on some trials; the old response, stronger on 
other trials; as à result the probability of correct response would be greater 
than zero but less than one.) . ' 
It SEI to us that this theory quite clearly implies that the major observ- 


able effect of negative transfer should be a delay of the acquisition of nonzero 
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levi 

probability of the Correct response.! Once the correct association has ER 
а Strength in the same neighborhood as the association from the firs i 
the situation would seem comparable to the one that occurs s 
association is being learned. The statistical methods derived from the SERE 
stage Markov model enable us to estimate the mean number of trials kn de. 
before the probability of correct response changes from zero to Some sition 
mediate value, р, as well as the mean number of trials between that a 
and the time when the probability of correct response becomes Ар "(hal 
1.0. We derive the expectation from the theory of associative ME QUE 
the major effect of negative transfer should occur in the first of these s Kai 
and little or no effect of negative transfer should be found in the sane 
ments of trials needed to complete learning once the probability of co 
response has become nonzero. 

While the theory of learning as 


iri jati rength 
acquiring the strongest associative st 
Provides one clear expl 


е z shanisms cou 
anation for negative transfer, other mechanism called 
р у "hes -e reca 
possibly operate to produce the effect. If the first-list associations werer р 


at 
; Fe б T rate 4 
available for Study of the new associations. This could reduce the га 


: € | - is was 
which the new associations gained Strength over trials. (This hypothes! 
Suggested by Anderson & Bow 


is limited in the tot 
in a small a i 


In 


; A х the 
and would have less capacity for modification to increase the strength of 
new associations. 

All of the mechanisms that we h 


n a " cing 
àve been able to think of for produ 
negative transfer based on interfer 


d : what the 
ence between associations imply ihat E 
interfering effect of strong associations will be greater, at least on the avera 


А Т «curs 
than the interfering effect of weak associations, If negative transfer oven 
because of recall of the first responses, then Such recall is more probable whe 


í 3 T GR 
In earlier discussion ch а simpler and, We now see, inadequate vers 
associationist theory in which the first i es "unlearning of the first assoc! 
сре) я У ~ F E АЕ ent О 
í rning of new pairings, and the second stage, “replacement © 
COMI 5, James, & DaPolito, 1971). jus $ 
€ctly pointed ош that no such si ion of bre? 
i ESE Mao: a uch simple notion а 
ing off the A-B and replacing it with A-C was ever assumed by оа of associa 
ation iat the Present discussion gives a fairer picture of at least Eu 
viable form of associationist theory, The assumptions We now attribute to associations 
theory have the same implication regarding the locus o, Negative transfer effects aS the 
simplistic view we addressed earlier. On the other hand they seeni to provide realistic 
expectations for retention of first-list responses after second-list learning, the possibility 
of probabilistic multiresponse learning, and maintenance of both A-B and A-C associa- 
tions under appropriate conditions, Such as Instructions to use В responses as mediators. 


ations 
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t 
ume E sirane: If negative transfer occurs because capacity for 
decreasing ilie a a increasing the strengths of new associations and 
Wien oe Hit of old ones, then the amount of decrease is greater 
CENE DOSE 7 evel of associative strength is high. If negative transfer 
tions before set HERE associations must exceed the strength of old associa- 
When the old Ay responses can be performed, then more delay will occur 
mechanis associations start at high strength. Perhaps there 15 an associative 
ism that can cause negative transfer but that lacks the property of 


havi ~ As 

= ing greater effect when the interfering associations are stronger. As far 
as wi A : 

€ can determine, though, no such mechanism has been proposed, and 
inced that associationist 


Piera e alse cannot think of one, we remain convi 
sd predicts that the major effect of negative transfer should occur early 
Wee TUNG: when the associations learned in List 1 are the strongest, and 
it would appear as an increase in the mean number of trials needed to 

above zero. 


га is "P ^4 
aise the probability of correct response 


COMPARISONS OF С-В AND А-В: 
al comparisons involving the paradigms C-B 
is that they should not be 


uch comparisons ! 
ase familiarity. since in both the С-В and the 
ses used in the transfer list were learned in 


. We have several experiment 
x A-B,. The advantage of s 
omplicated by effects of гезро! 
A-B, conditions the same respon 
Initial training. 


James and Greeno's Experiments 


o experiments reported by James and Greeno 


he analysis of stages have been given in brief 
Greeno, James. & DaPolito. 1971). This pre- 
ation regarding goodness of fit of the model 


The first data are from tw 
(1970). The main results of t 
form earlier (Greeno, 1970: 
Sentation will give fuller inform 


and other statistical matters. 


In the first experiment. each list contained t 
lives, with two groups (one A-B, and one C-B group) learning the first list 
to a criterion of one per no overtraining (No OT) and the other 


fect trial with ng( 
two groups learning the first list al criterion and then receiving 


to the one-tri 

Seed ouf wel of overtraining (OT). In the second experiment each list 
contained six pairs of two-syllable adjectives. There. were eight groups in 
a2 « 2 x 2 factorial design. One factor was the main variable—the differ- 
Му A ma GB conditions. A second factor was the presence 
ар не of pretraining lists (PT or No PT) each with the same 
ET ава used in the last two lists but with different stimuli; each 
pretraining list was studied for six trials. The third factor was the presence or 
ing on the next-to-last list following a criterion 


i vertrain 
absence of 18 trials of over 
of one perfect rial (OT or No OT). The experiments were both carried out 


en pairs of two-syllable adjec- 
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r iments 
Table 6-1 Tests of Simplifying Assumptions in James апа Greeno’s Experi 


е = 0, Init, Inita, та 
У пета Int, bsd g=q jf =4 
of Items Condition df = 2 df —1 df = 3 L "m 
10 A-B,, No OT 13 =e TU m 
10 С-В, No OT З ; а и 
10 А-В, OT T 8.4** 33.7** a. 
10 C-B, OT 2.0 i] 6.2 15 
6 A-B,, No PT, No OT 157 23 23 E 
6 C-B, No PT, No OT Ll m > n» 
6 A-B,, No PROF 55 8.8** 9.6** M 
с вотот 2 2 3.2 19.18" 
CR PT NSO ag 2 2.9 9.7% 
6 C-B, T, No OT :3 3:3 5 эл, 
6 A-B,, РТ, OT 6 Ep" vs А e 
: е ЗА 1 14,8** 18. 


*denotes р < 05; **denotes р S01. 


| x items fof 
Subjects in each group, giving 200 n 
iment, there were 25 subjects in each £ 


1 the 
i Used for estimating duration of stages ae 
Same as those described in Chapter 4, Recall that the first step invo 


: -tribute 
2 log. 1, which Would be asymptotically distribut 
dom as j 


Goodness of fit was evaluated Using the Same five 
considered in Chapter 4. The results are sum 
gives the goodness-of-fit chi-square stat 


distributions of TT 
Ss Marized in Table 6-2, which 
istics obtained for each distribution 
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Table 6-2 Goodness of Fit of the Two-Stage Model to James and Greeno's Data 


Errors Errors Trials Trial 
N before after after of 
M ren First First First Total Last 
tems Condition Correct Correct Correct Errors Error 
10 А-В, No OT 18.37 589 1067 1649 28.99 
(18, 13) (9, 7) (12, 10) (16, 13) (18, 13) 
ж ee 
10 С-В, No OT 18.12 242 565 18.52 20.74 
(16, 11) (7,5) (9,7) (13,10) (15,10) 
* * 
10 A-B,, OT "P to ee 2S 
(20, 18) (23,20) (25, 20) 


(18, 13) (16, 14) 
13.75 11.07 20.21 


10 C-B, OT 20.08 747 
ia (09 069 029 (15, 10) 
12.08 158 1892 17:52 


6 А-В,, No PT, No OT 13.68 


(13, 8) (7, 5) (10, 8) (11,8) (14, 9 


6  C-B, No PT, No OT s ud á a 25 у | 5 | es 
IN NO 
6 С-В, No PT, OT atn 20 ma às 6.2 
6 A-B,PT,NoOT (105 he 66 6.9 (m 

3.03 1.99 6.36 3.02 9% 


6 Е 
сыво мәт ga GD 5: ~ ~» 


6 = 16.04 1.45 7.99 21.44 12.92 
а (11,6) (7, 5) (9,7) (10, 7) (13, 8) 
ж жж 
3.68 7.44 5.11 11.29 9.28 
B HER PIS 64 62 69 005 G2» 
* * * 
.01. 


*denotes p < .05; **denotes P < 
es of freedom for these tests are not well defined 


ecified. The numbers in parentheses below each chi 
degrees of freedom on the lower and upper bounds 
hat test, under the null hypothesis. We have indicated 
s that would involve significant discrepancies. 


tested. Recall that the degre 


but bounds can be SP' 
square statistic are the 
of the distribution for t 
with asterisks those test 
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hree 
The twelve experimental conditions gave 60 tests of the tir A of 
tests indicated rejection at -05 or below for the upper bounds on p qutt 
freedom, while 15 indicated rejection for the lower bounds. In ma mi 
three of the rejections, the empirical distributions had considera А the 
variance than the theoretical distributions, and in only one t 
empirical distribution substantially less variable, Thus, it is a X enin 
possibility that the main cause of discrepancies from the model was ina 


: m- 
AUS à : iects had identical para 
of the simplifying assumption that all Items and subjects had identi 

eters. 


The model seems to have fit w 
to be used in estimating the dura 
hypotheses about which stage of le 
transfer. An identifying restricti 
analyses. Other restrictions are 
However, with ¢ — 


«timates 
ell enough to permit parameter pen 
ation of stages of learning, and "n ative 
arning showed the greater effect ofn Qu 
on is needed, and we used c = 0 in tried. 
Possible, and c — d and b = 0 were roups 
d, solutions for the parameters in three A-B, Sides: 
in the (0, 1) interval required for values of proba А 
; ossible values were obtained for four C-B nen 6- 
i r the five Unrestricted parameters are given in Tab ge 
The quantities of main interest are the mean numbers of trials neede! an 
the two stages of learning. Under the restrictions on the initial vector ? 
the identifying restriction c = 0), Equations 4-6 and 4-7 simplify to 


r 


1 
(2 — 
) o (6-1) 
Е(2,) = (1 — DIE Ux) 
qd 
The values calculated are in Table 6-4, 
Table 6.3 Estimated Values of Parameters 
Number 
of Items Condition a d d 1 e y 
10 92 o2 фи o 2 
10 2S ME MT 38 
10 do sis b тоо 38 
10 D ова CNET 
6 Рајан ан MEE, 
6 25. a S i al 
6 31 24 .26 57 .28 
6 -39 .39 42 85 235. 
6 A-B,, PT, No OT 58 42 33 34 .29 
6 С-В, PT, No OT PS NA PT .30 
6 А-В, PT. OT -34 .30 33 66 .34 
6 С-В, PT, OT EE UNUS L00  .29 
€ > ee uir Le 
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Table 6-4 Measurements of Difficulty in the Two Stages of Learning 

Number 

of Items Condition Е(21) E(Z2) 
10 A-B,, No OT 4.54 4.12 
10 C-B, No OT 3.94 2.76 
10 A-B,, OT 6.49 10.05 
10 C-B, OT 3.50 2.95 
6 A-B,, No PT, No OT 2.58 5.09 
6 C-B, No PT, No OT 2.99 1.28 
6 A-B,, No PT, OT 3.19 4.35 
6 C-B, No PT, OT 2.58 2.70 
6 A-B,, PT, No OT 1.74 3.80 
6 C-B, PT, No OT 1.87 1.55 
6 A-B,, PT, OT 2.92 3.59 
6 C-B, PT, OT 2.48 1.22 


Source: From Greeno, 1970 
ments obtained for the 


ding A-B, and C-B 
e 2, there were large 


ditions, the measure! 
ual for correspon 


f difficulty of Stag 
and C-B groups. with A-B, taking 


ns can be checked by testing statis- 
arameters across experimental 
ce in Stage | is straightforward. 


dm of the experimental con 
groups "n Stage | were nearly ed 
and om Меер in the estimates о 
More Paiste nt differences between А-В. 
Nore trials in every case. These impressio 
tical hypotheses about the invariance of P 


a didis Testing the significance of differen t 
Since E(Z,) depends only on the value of a, the relevant hypothesis for test 


is that both C-B and А-В, had the same value of a. Maximum likelihood 
estimates were obtained with a single value of a. and with the four remaining 
identifiable parameters allowed to take different values in the С-В and А-В, 
conditions, The resulting maximum likelihood value was compared with the 
maximum likelihood obtained with differing values of all the parameters 
from the two conditions. giving à likelihood ratio test with one degree of 


freedom. 

The question about the se 
because of the involvement of sev 
relatively weak form. Let Dy. Сп and а, 
eters for a С-В group. and let bz, €x 


Parameters for the corresponding AS BRB T de 
à to vary between conditions. and the remaining eight identifiable parameters 


Were restricted by requiring 1 ^ jg, tis = ШЙ = d;. The two second- 
Stage performance parameters ¢ and q were allowed to vary freely. Thus, 


айт hypofiesis could be satisfied by any combination of parameter values 
Fi a "i Заим d» and with constant values of 6, c. and d. This means 
that the test did not require assumptions of equivalent performance in the 

r did the test depend on any identifying restrictions 


intermediate stà 


cond stage can be asked in a variety of ways, 
eral parameters. We chose to ask it ina 
be the second-stage learning param- 
and d, be the second-stage learning 
group. The hypothesis tested allowed 


ge. nO 
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Table 6-5 Tests or Invariance between A-B, and C-B 


Number 2 

of Items Condition Stage I Stage 
10 NoOT 14 2047 
10 OT 29.8** 35 
6 No PT, No or 1.1 AM 
6 No PT, OT 1.6 Zw 
6 PT, No OT 4 16.3** 
6 PT OT 1.9 11.0 


**denotes P < 01. 
Source: From Greeno, 1970 


one degree of eoo 
oth tests in the six experimental "istribute 
4, which Should be asymptotically pe ru 
the 


The values giv 
as Y?(1) if th 
hypothesis w 


еп are —2 log, 
е null hypothes 


C- 
с, ог d had different values in А-В, and nin 
groups. We can apparently c differences in E(Z;) show th 
€ difficulty of kd piedi 
Ndition involving ten items the 
culty of completing 


second stage of learning, 
Tis expected in the second stage | 
i е first-list retrieval б 
the finding seems a а t 
follow from associationis 
greatest in situations where 

the Strongest, 


A-B, transfer, since consid 


Effect of Overtraining on А-В 

A major purpose df demie and Greeno's Experiments was to explore the 
effect of overtraining on the initial list prior to transfer. In both the experi- 
ments described earlier, there were additional groups in ibi | 
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Table 6- 
6 Mean Total Errors per Item in James and Greeno's Overtraining Experiments 


Number T 
о " 
f Items Responses Condition Overtraining Overtraining 
5 Adjective АВ, 6.46 1177 
10 Adjective C-B 4.56 4.20 
10 Numeral А-В, 4.21 4.31 
6 Numeral C-B 2.60 1.54 
6 Adjective A-B,, No Pretraining 521 5.16 
6 Adjective C-B, No Pretraining 2.69 2.63 
6 Adjective A-B,, Pretraining 3.43 4.18 
6 Adjective C-B, Pretraining 1.93 2.19 
6 Numeral A-B,, No Pretraining 3.12 95 
6 Numeral C-B, No Pretraining 1.62 1.04 
6 Numeral A-B,, Pretraining 2.34 2.15 
8 Numeral C-B, Pretraining 1.16 1.26 
Adjective А-В, 5.31 4.22 
(Numeral Stimuli) 


Were paired with numeral responses- Table 6-6 shows the mean number of 
errors in the transfer list for all the groups in both experiments, plus an 
additional А-В, condition run with an eight-item list with the stimuli being 
5 llable adjectives. 


n me 
umerals 1-8 and the responses common two-sy фу 
The main reason for studying negative transfer after overtraining is to 


Provide further evidence on the question of whether negative transfer is 
greater when the interfering associations are stronger. If learning is a process 
Of strengthening associations, then surely overtraining should have the effect 
Of producing associations that are stronger than those acquired when a 
Modest learning criterion 15 met. But the results indicate that in six cases out 
Of seven, there is no greater overall negative transfer following overtraining 
On the initial list than there 15 when the initial list was learned to a criterion 


Of a single errorless cycle- gs А ini 
Return to Table 6-4. and compare the А-В, Sonica МИА ane 
to the corresponding conditions without overtraining: Mine Hatt ies 
a larger value of E(Zi) for the group with overtraining 1n each case. Statistical 
tests indicated that these were significant; the results pue Table 67, which 
ons involving lists with adjective 


gives val for all comparis 
ues of —2 lo i i 
5 Вета ing experiments. The main finding was greater 


Tesponses from the overtraining d ) 
difficulty in Stage ! for the overtrained groups in the three A-B, conditions 
having adjective pairs. I" stage 2, the overtrained A-B, group with 10-item 
lists had more diffic criterion group, but the overtrained A-B, 


group with six-item pretraining had a significantly lower value 
of E(Z,) than the criterion £rouP- The overtraining effect for the A-B, group 
2 - = 


with eight numeral-adjective pa 
group with overtraining- 
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T ining 
Table 6-7 Tests of Invariance between Criterion and Overtrai 
(Adjective Responses Only) 


Number "P 
of Items Condition Stage 1 Stage 2 
10 A-B 8.9** 37,8** 
10 C-B 9 dd 
6 A-B,, No PT Tele pee 
6 C-B, No PT 1.0 Me 
6 AB,, PT 11.3** i 
6 C-B, PT 2-7 Ў 
8 А-В, 10,5%% l. 
*denotes р - 


.05; ++ denotes Due 0T. 


The major findin 


Overtraining on the 
A-B,. In а 


f 
~ ct 0! 
| У ral lack of effect 
g In these experiments was the general lack ared with 
ative transfer when C-B is suit 
5 Were used either as stimuli or res 


a an 
А i r list thé 
NO greater difficulty in the pe ea pair 
corresponding groups trained to criterion, In the groups with adjec 


К vane 
S AS S й i isadY 
and six-item lists, Overtrained A-B, groups apparently had a small poa 
tage in Stage 1, but in Stage 2, where most negative transfer was rre 


increasing neg 
Stage. Thus, the Over; 
against the implicatio 
be greater when the i 

What of the two 
adjectives was 
cont 


son 
E i seco 
Increased difficulty in the videnct 
sults is to provide further € 


1 
A rer shou 
theory that negative transfer 

associations 


to the gener, 
n 


additional neg 
ced by organizati € 
aining. With shorter lists mii 
Я ng that preceded the criterion probably o 
sufficient to permit easy retrieval of items. However. with the long 18 
containing no numerals. it would not Tising if criterion could 5 
reached with some items still in a etrieval occurred with soin” 
difficulty. This difficulty could b PC. ems 
were noticed and stored in memory, Providing a stron 
list. And this organization led to increased Negative 4 

The fact that overtrained groups with adjective Pairs apparently had more 
difficulty in Stage 1 might be interpreted a slight evidence for the hypothesis 
that negative transfer is stronger when interfering associations are sironge™ 


n that condition 
processes carried out by 


with lists of numerals, th 


ona 


relations among "eie 
ger organization 0 
transfer, 
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How ў ; r 5 
ever, in the light of all the other failures of that hypothesis, we are 


Lu rie a for another interpretation. If we follow the hypothesis that 
PNE e storing an encoded representation of the pair, then 
Astin ah > ude that overtraining can make that process more difficult. 
ОКА - hypothesis that seems most likely is that during overtraining, 
i ects better organize their representations of the list, and this interpair 
grouping causes the increase іп Stage-! difficulty. If we conclude that increased 
i ili organization of the list can produce increased negative transfer in 
wi. Ocess of storing representations of pairs, we imply that there probably is 
j^ oe transfer affecting that process In any case of transfer. Consider- 
€ results of the previous section, the conclusion has to be that these 
UN are small relative to the magnitude of negative transfer on the second 
аве, which we assume is the process of learning to retrieve pairs reliably. 


Pagels ; 
5 as 5 Comparisons with Effects 
eaningfulness and Similarity 


avolving study of meaningfulness and similarity 


Pagel's (1973) experiment ir eful 
acquisition Were discussed 


Wa à in NEN 
i as mentioned and her results regarding initial j 
n Chapter 4. The main purpose of her experiment was study of negative 


transfer, and we now turn to her findings regarding comparison of A-B, and 
C-B transfer lists. Recall that in all four experimental conditions, stimuli 
Were consonant-vowel-consonant (СУО) trigrams and responses were the 
Numerals 1-7, In two groups, the trigrams were English words, in the other 
two they were nonsense. Also, one list of words and one of nonsense trigrams 
had very dissimilar stimuli. and the other had very similar stimuli, formed 
by using the same letters in many different trigrams. Illustrative stimulus lists 


are shown i Л .8 
vn in Table 4-8. ean number of trials in each stage of learning 
d complete details regarding 


6-8. (She presente 
t, and estimated values of parameters 


4 Pagel's estimates of the m 
luring transfer are in Table jd 
Simplifying assumptions. goodness ol 1 


Table 6-8 Pagels (1973) Estimates of Difficulty in Stages during 
able 6- 


Transfer 
Transfer 
stimuli Condition E(Zi) E(Z2) 
Stim 
- 1.4 

Dissimilar words =) 1 re ja 
Dissimilar words АВ 141 ae 
Dissimilar onsense on 122 3. 
Dissimilar Nonsense UH les 1.66 
Similar Words Gd TS EE 
Similar ords АВ 4 rad 2.46 
Similar nsense А E .00 8.06 

N nsense ^ 3.57 7.65 
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:ance across 
sis variance а 
-) Pagel conducted rather strong tests of inva 


ters 
» the null hypothesis tested was that all the rus 
atistic were constant across groups: XD first 
ejection of equality of A-B, and C-B "issimilat 
ilar words, p — .01, and the group yes е secon 
on of equality of А-В, and C-B imn t MES 
that with dissimilar nonsense stimuli, / 


ed by 

; sfer, as measur 
t of negative transfer, as тег ord an 
5 
ame 

NP ings of Jà 
onsense. In both of these conditions, the dne dd transiel 

: ati 
€ confirmed; there Was substantially more nega 


ups 
К 6 aini wo gro 
age of learning than in the first. In the remaining t nts 0 


Pagel obtained the greatest amoun 


difference between C_B and A-B,, 
with dissimilar n 


and Greeno wer 


nd relatively small amou 
ages of transfer learning. | issimilaf 
amount of transfer obtained with uf 5 
Shows, we have obtained moderate sumer 
d stage with six-item lists of adjective-n words 
reason why Pagel’s pairs of three-letter 
ffered so Substantially, -eprese 
| t milar nonsense, however. probably rep 
quite an Interesting exceptio. 


n 
d i-i dos 
€ other Process involves organizing the rep the 


nt. We have assumed a ier 
Bel 'obably important in producing negative om 
The organization appropri ; pairs involving one reli 
retrieve pairs with d 
System with another са!>, 


tion with new stimuli, 
In the A-B, condition, the advanta 


: he 
8 already differentiated ~ 
l ally found, which we interp 
ctive Organization, 
Consistent wi 


: k 
i а CO With Ваша (1966) intratas 
interference hypothesis, in which It is asserte А ) 


d that when learning presents 
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diffi ; . X 
culty due to interference, retention and transfer will be facilitated. How- 


ever. E us z 
теза the connection proposed by Pagel between Battig’s hypothesis and the 
t was the process of acquiring interpair groupings, 4 process essentially 


sidings to our idea of forming a retrieval system. We think that a retrieval 
wen most likely to be a negative factor in A-C and A-B, transfer, and 
or nk it more likely that Pagel’s result is due to stimulus differentiation. 
у Course, this conclusion is still consistent with Battig's general view that 
aeui facilitated by intratask interference in the initial list, because 

arities among the stimuli force subjects to develop strongly differentiated 


Tepresentations. 


COMPARISONS OF FOUR PARADIGMS 


ncluded all four of the standard negative 


We now turn to experiments that i 
B. and A-B,. 


tr s 
ansfer paradigms—C-D, A-G, C 
James? 

ames’ Experiment 
One set of data was obtained by James as tl pe 
€xperiment aimed mainly at studying retroactive interference. The ata 
Tegarding retroaction will be presented in Chapter 8. James experiment used 
lists of 10 adjective pairs presented оп a memory drum with alternating cycles 
Of study trials and tests on all items in the list. Each study presentation and 


lest interval was for 4 seconds. anda 4-second interval separated study and 
lest Cycles. Both the initial A- sfer list were presented to 


B list and the trans à 
а criterion of one errorless test cycle. All subjects learned the same list of 
А i . n 
tems in transfer; the items in t 


he first lists were varied to produce the rela- 
lions between lists needed for the paradigms. There were 16 subjects in each 
group, 


The overa asures of di Je 

total Ea NE и the four transfer conditions were as follows: C-D. 
1.07; A C. ps ps p. .86; A-B,, 2:14. Analysis of variance and pairwise 
compari AP f d ou s were carried out, showing that A-C and A-B, each 
dise isons of gr s and C-D reliably. but the smaller differences between 
red from both C- B, were not significant. 


Gx еп Acc and А- i 
B and C-D and pd was applied. and tests of simplifying assump- 


T arkov m ; 
Plane nies initial vector could пне E = La s—1-—a, 
=e. and in addition. = а and е= 4 eus e e. Together these 
restrictions impose four restrictions on the identi pu | под og The test 
ии fut the four conditions were 1.52. 2.22, 1.53, and 4.12, 
d A-B,. All four values are comfortably 


: ос. С-В. an 
Tes = H i 
Re Lee ud range of chi square with four degrees of freedom. 
n A cents the values of chi square obtained in testing goodness 
e 6- 


of fit for the restricted model. (For A-B, one of the parameters had an 


(1968) as the transfer phase of an 


fficulty in transfer gave typical results. The mean 
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' Data 
i James 
Table 6-9 Goodness of Fit of the Restricted Model to 


Trial of 
н 1 ror 
Errors Before Errors After Trials After P Last En 
Condition First Correct First Correct First Correct i: p" T 
i 2:5 (7, 
x 2.88 3.45 1.58 
is (9, 6) (3,1) (3,1) (6, 3) a 
.98 (5,2) 
— :37 M 1.34 ~ , 
=R ey (2, 0) (3,1) (3, 0) " 
+ Q 
A-B 7.78 75 2.48 о (8,4) 
у (9,5) (4,2) (5,3) (is E 
C-B 8.3 Al 1 Pn (0 
(5, 2) (1, 08) (1, 08) De 
* 


, ol 
псу 
3 eque 
" etical fre 
01. indicates too few cells with и 
€ nonzero degrees of freedom for the liberal bound. 


T 

гіс 

" ге 

;ithout i 

as more convenient to test the model w ~ freedom’ 
es the agreement but reduces a degree a 

uld have no effe: 2» 

appears to hay, po int 

, dying 

identify 7 

; an iden 5 

arameters, а, с, а, and p. Asan fü? Jame 

i articularly reasonable s, Ап 

Probability 


tions 


With the restrictions em 


of learning is E(Z,) - and for the Second stage, E(Z,)= (1 "Ps у 
values obtai ions are graphed in Figure 6-1. 


ps 
3 | r Stag 
ard, involving а for Stage | and c for all 
аглед out for 


A 
ON: 
њи 
all three Parameters compa 
Table 6-10 Estimateq Parameter Values for 
James’ Experiment 
Condition a pe: aos 
A-C 50 41 e 


Negative Transfer 145 


3L 
о 
~ 
xr 
T 
m 
2 
<q 
3 Г 
0 
со АС CB АВ, ср AC CB АВ 
FIRST STAGE SECOND STAGE 
Figure 6-1 Estimated mean number of trials in the two stages of 


| i даде $ н 
learning for transfer lists in James’ experiment. 


Pairs 

inal ors The results are in Table 6-11. In the first stage, the 

С-в diffe pue between C-D and A-C was reliable. In the second stage, 

Perlman significantly from both A-C and А-В,. Reliable differences in 

the Аб Е in the intermediate state were obtained between C-D and both 
and A-B, conditions. The values shown are for the test statistic 


meter Invariance for James’ Data 


Table 6-11 Tests of Рага 


i. Invariance of a (First Stage Learning) 
C-D А-В, С-В 
А-С 4.42* .86 3.02 
C-D 1.96 23 
А-В; 99 
ii. Invariance of € (Second Stage Learning) 
C-D A-B. C-B 
A-C 42 3.58 [1.795€ 
C&D 2.83 3.03 
A-B, 27.50** 
iii. variance of P (Intermediate State Performance) 
C-D A-B. C-B 
A-C 4.76* 24 51 
C-D 6.11* 62 
А-В: 88 
UT. 


* denotes P — .05; +" denotes P * 
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|| 
, 2 er the nu 
—2 log. 4 which should be asymptotically distributed as y2(1) und 

hypothesis in every case, 


Greeno's Transfer Experiment 


f difficulty of the two stages of learning eia re 
ditions of meaningfulness, The general proe рае 
described in Chapter 4, Up to five subjects pa e, When 
ach subject reached criterion at a different ue "P . 
i instruction was given to dec trials 
. they were given ikea same 
ist was presented using 


ter 
ae ions, to del 
t statistics for simplifying assumptions, 


appli 
“stage Markov model could be aPP g 


а 
strictions | 
: to apply the model under one set of restriction 
Cases being compared; 


jb therefore. we wil 
restriction e — q, b =q, 


Table6-12 Tests of Simplitying Assumptions in Greeno's Transfer Experime” 
pA 
e =q, Init., y al 
Stimuli Responses Paradigm E ai P = 3 ES : T ae - 
Words Words А-В C. 19.52, 
Words Words A-B, ге А 02 6 E 1037, 
Words Words А-В, C_B 7.90* 1:26 7. m 9 ее 
Words Words A-B, А-В, 22 55%% 4.61 8.2 б 3744, 
Nonsense Words А-В, C-D 1.00 1.34 23.06 14.69. , 
Nonsense Words А-В, A-C 74848 3 Ој са 107.90 
Nonsense Words А-В, C-B 10.074» 4.72 а 11.22 
Nonsense Words А-В, A-B, 1.84 59 10.15 4.67 , 
Words Nonsense А-В, CD 55 76** 50 4 з 37.48 
Words Nonsense А-В’ A-C 5.90 24 31.85 6.84. 
Words Nonsense А-В’ C-B 2225 id. 6 eae О 
Words Nonsense А-В; А-В, 37/5302 8949 oe 105.347, 
Nonsense Nonsense A-B, Ср 9774 3-80 SEA 13:59. 
Nonsense Nonsense A-B, A-C 111768 Е 13.457" 16.107: 
Nonsense Nonsense А-В, C-B 828* 37 11.45* 4,43" 
Nonsense Nonsense А-В. А-В. 4419€. 02 14.69** 


* denotes p = .05; ** denotes p — 


o 
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are shown in Table 6-13. The question is 
alid; therefore, the version of the model 
- d, used in obtaining the estimates. 
e upper bounds of degrees of 
d two more indicated rejection 


inci of testing goodness of fit 
tested V. онај estimates are У 
There ien p TERI ANONS E N b= 
isse: à 0 nonindependent tests. At th 
HE gc Hu yes rejection at p < ап reje 
telefon > t the lower bounds of degrees of freedom, 10 tests indicated 
at p < .01 and 18 more indicated rejection at P «s. 

we of fit was poor enough to raise doubt about the validity of 
chee + Therefore, we examined the nature of the discrepancies. Table 6-14 

5 theoretical and empirical means and standard deviations of six statistics 


in а Ws " А 
C all the conditions of the experiment. The column headed "Before First 
Orrect: A" has the distribution of trials before first correct, conditional on 

“Before First Correct: 


ple no errors after the first correct response. 
as trials before first correct, given one ог more errors after the first 
Correct response. Estimation of seven parameters makes it likely that the 
theory will agree with the observed means; indeed, only 2 of the 96 predicted 


Means fell outside 90°, confidence uted from the data. The 
question of interest involves the st The model assumes 
pomogeneity of parameters for items and subjects; of course, that must be 

se, but as in nearly all experiments. items were selected in an attempt to 
Produce roughly equal difficulty, and subjects were taken from a fairly 
homogeneous group. To the extent that the usual experimental efforts to 
Zain homogeneity failed, the У. ong subjects and items should lead 


to larger standard deviations in dicted by the theory. On the 
Other hand, if the learning process were more complex than the one postulated 
in the model, this could lead to considerably smaller variances than the model 
Predicts, A general property of learning in one or two discrete stages 15 that 
Performance is highly variable (cf. Restle & Greeno, 1970); if learning 
Occurred in three or more stages. and if parameters were fairly homogeneous 
across subjects and items, then We would expect the empirical standard 
deviations to be smaller than the theoretical ones. н Ги 

, In 22 cases, theoretica rd deviations fell outside 90" 4 confidence 
intervals computed from In 17 of these. the empirical standard 
deviations were higher tha d values. It seems to us that the most 
plausible conclusion is that approximation of the learning 


Process, and the main deviation fred: aeRO BAPE онор 
items not assumed in the model. 
Under the restrictions Sd ee the model has seven theoretical 
Harameterseant іх identifiable parameters. The identifying restriction chosen 
was c — d. Estimates of the six parameters in each of the twelve groups are 
at the failures of the restriction on 


given in Table 6-15: It may be noted th 
rently were due to a greater amount of learning on the 


the initial vector appa v 
initial study trial than on later tria s. (As a rule, | — s was greater than a 


intervals comp 
andard deviations. 


ariance am 
the data than pre 


1 standa 
the data. 
n the predicte 


s invo 


Table 6-13 Goodness-of-Fit Chi 


-Square Statistics 


Trial 
Errs. Errs. Tris. of j 
fore After After Las 
E а First 2 Ет. 
n Erri 
Stimuli Responses Paradigm Corr. Corr. Corr. i 1.65 
m i 23 
Words Words CD во 2.90 5.38 $0) (P 
(7, 10) (3, 1) (4, 2) e " 
** 5 38 4. 
Words — Words A-C 460 1.69 8.44 (6.1) (8,2 
(9,3) (3,1) (5,3) ha 31 
5 1.30 9) 
Words — Words C-B 3.88 1.80 2.75 40» Go 
(6,0) (2,0) (51) (4, sn 
93 i 
Words ^ Words A-B, 398 328 374 E » d 
(10, 4) (5,3) (5:80 320 
4 "7 
Nonsense Words C-D 4.29 1.10 3.16 n (8,2) 
SD Вл аз) 6, eo as 
Nonsense Words A-C 13.89 6.04 4.08 І 4 at: 
Малу — 16/4. — sy — til 1059 
Nonsense Words C-B 4.66 3.02 2.32 x (9, 3 
(10, 4) (4, 2) (6, 4) (7,2 aid 
Nonsense Words А-В, 13.08 12.82 13.13 16.13 (12, 6) 
So £3) 8H 1$ xl i 
* * E 18.3 
Words Nonsense C-D 2835 15.04 3.0 
6.96 8.33 a1 
51) — (45 (6,4) (12,7) 
ж жж * * 1325 
Words — Nonsense Á-C 23:35 743 6.24 11.38 (13, 
(17, 11) (5, 3) (7,5) (11,6) T 
* 7, 
Words Nonsense C-B 8.00 4.22 6.96 3:99 (7, р 
(8, 2) (3, 1) (5, 3) (6, 1) 7 
s * 20. 
Words Nonsense A-B, 11.68 13.22 27.94 11.92 (16. 10) 
(14, 9) (8, 6) (11,9) (13, 8) 
* LL m 20.04 
Nonsense Nonsense C-D 28.40 9.96 11.98 32.18 20. 14) 
08,22) (6) (10,8) (19, 14) CO 
N A Ms Je 
Nonsense Nonsense -C 2029 14.96 1254 13.38 30, 14) 
(25,20) 6 01,9 — (18,13) (С = 
ж * 2 
тав born pm dc 8.14 734 1298 "n 9) 
>» 10) (8, 6) (10, 8) (13, 8) Roc 
E .02 
лара 12) uu 27.90 9.80 E 12 
aD 0310. (16, 11) 
~ .05, +" denotes p — 9j. a indicat, 
И ав to have nonzero degrees of Sos 
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freedo, 


the liberal bound. 


enc 
9 few cells with theoretical frequ 
m for th, 
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815 91% [142 997 60€ 181 Sse 9ГЕ ory we аша v NIN 
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а iment 
Table 6-15 Parameter Estimates from Greeno's Transfer Experim 


Para- b у jh à 
Stimuli Response digm N 1 8 a 1 E qd ee 
SOS Weel XD ga En 450 335 .370 267 p 
Words Wonks €D soe “ee 397 409 410 38 42 
uod Words. 4g Sig Aos ЯШ 38 E 8 e 
Worde Ману Gg xe то 727 495 434 585 За 
Nonsense Words ^-C 280 914 (092. 200- „271. 2142 588 
Nonsense Words CD 296 $49 :541 — ,302  .344 265 362 
Nonsense Words А-В, 288 387 568 "136 239. 2704 426 
Nonsense Words C-B 304 535.366 .236 .340 210 4 
Words Nonsense A-C 320 83 Jc 427 288 347 75 
Words Nonsense C-D 296 :639 202 223 320 113 “406 
Words — Nonsense А-В, 304 843 161 .178 .201 .344 537 
Words Nonsense СВ 320 479 .680 258  .398  .862 3 
Nonsense Nonsense A-C 296 90 190.041 .203  .194 “403 
Nonsense Nonsense CD 30 513 168 04s 207 082 “395 
Nonsense Nonsense А-В, 280 654 183. 422 1170 259 316 
Nonsense Nonsense C-B 247 115 


328  .647 


247 118 „217 .207 eo 
че Of 
^" 9 б 1 i у Wi 
and 1 was greater than ab.) By allowing an extra second of study pn ir 
the initial trial of each item, we apparently made that trial about eA ians 
effectiveness to later trials for the hardest conditions, but in most cone 


the extra second of study time Produced an advantage over later yia th 
Under the restrictions applied here. the expected numbers of trials if 
two stages of learning are 


E(Z) | D 


E(Z;) = U — 20 — o). 


MEAN TRIALS 


CD AG GB’ АЕ 


со АС CB AB, 
FIRST STAGE 


SECOND srAGE 
Figure 6-2 Estimated mea 


learning for transfer lists co 
transfer experiment 


" number of trials į 
s 
трозед of 9 'n the two stages of 


9un-noun Pairs in Greeno's 


MEAN TRIALS 


iad os expectations cal 

i ier -2-6-5, with each figure 5 
а ningfulness. 

ee of differences betv 
usly for these data than In 


MEAN TRIALS 


Negative Transfer 


3 
2} 
| | | 
0 
co AC CB АВ, ср АС СВ АВ, 


SECOND STAGE 


ials in the two stages of 
s in Greeno's 


FIRST STAGE 


Figure 6-3 Estimated mean number of tri 2 
learning for transfer lists composed of CVC-noun pair 


transfer experiment. 


culated from the рага! 
howing the data 


veen conditions У 
other cases We ha 


3r 

I | 

Te 

0 CD AC cB AB, cD AC CB AB, 


FIRST STAGE SECOND STAGE 


timated mean number of trials in the two stages of 


-4 Es! 
sfer lists composed of noun-CVC pairs in Greeno's 
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meter estimates are in 
from one of the conditions 


vas assessed somewhat less 
ve presented. The variances 


MEAN TRIALS 
[5] 


2 
1 
о 
Gou CAG iga AB, CD АС 68: AB; 
FIRST STAGE SECOND STAGE 
Figure 6-5 


learning for trans posed of CVC-CVC pairs in Greeno's 
transfer experiment, 


rictio™® 
Ons of the theoretical Parameters; with the rest 
b=c=dande=q, 


Wz) ~ 1 — па - 


=<) + 1 — tc) 
"vcre 
If we could Observe tt 


; mea? 
he trials of transition between States, then the 
trial in a stage of learning would ave the Sampling variance 
vZ) = V(Z)IN, 

" 2), 
where N is the number of obser he theoretical values of ie 
computed using the empirical estimates of Parameters. should give at 102 à 
a rough approximation of the Stability of the estimates of E(Z,) and E(Z2 
These estimates were calculated for the уапоџ 


usi n 
nh : 5 experimenta] conditions, 4 
used to compute the 95 о Confidence in 


У. 
у ntervals that will be presented belo” 
We begin our discussion of comparisons by 


: :4 results 
à > noting that the main sare 
reported in the previous section comparing A-B, and C-B were repeat 


bservations. 
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here. T Я 
here were relatively small differences between C-B and A-B, in Stage 


l, but substantial differences in Stage 2. 

E(Z,): Words/Words, (С-В) — (A-B,) = —0.26 = 0.16; 
Words/Nonsense, (C-B) — (A-B,) = —0.63 + 0.21; 
Nonsense/Words, (C-B) — (A-B,) = —0.20 = 0.37; 
Nonsense/Nonsense, (C-B) — (А-В,) = —0.46 + 0.55. 

E(Z,); Words/Words. (С-В) — (А-В) = —0.76 + 0.32; 
Words/Nonsense, (C-B) — (A-B,) = —1.27 + 0.48; 
Nonsense/Words, (C-B) — (A-B,) = —2.14 + 0.52; 
Nonsense/Nonsense, (C-B) — (A-B,) = —].12 + 0.75. 


D and A-C in the first stage gave no consistent 


gnificant. 

(A-C) = 0.03 + 0.18: 
(A-C) = —0.29 + 0.52; 
(A-C) = —0.23 + 0.50; 
(A-C) = 0.44 + 0.80. 


aoe between C- 
tern, and apparently were nonsi 


E(Z,); Words/Words, (C-D) — 
Nonsense/Words, (C-D) — 
Words/Nonsense, (C-D) — 

nse, (C-D) — 


en consistently harder than C-D, 
flerence, and another was 


Nonsense/Nonse 


have be 


I 
n the second stage, A-C appears (0 5 
а significant di 


a ES у 
Ithough one condition did not give 


borderline. 
(A-C) = —028 + 0.28; 


E(Z,); Words/Words, (C-D) — 
Nonsense/Words. (C-D) — (A-C) — — 0.82 + 0.43; 
Words/Nonsense. (C-D) — (A-C) = —0.52 + 0.42; 


Nonsense/Nonsense, (C-D) — (А-О = —0.51 + 0.65. 
er in this chapter. we expect that in Stage 2, 


the differenze between ЕВЕ and C-B should be greater than the difference 
heer adr at C-D. This second-order difference was in the predicted 
direction for all four conditions (as it was also in James’ data—recall Figure 
7-1), and it was significant in two of the four cases. 
E(Z,): Words/Words. (C-B)- (A-B) - (C-D) TAS) = EU dos 
Nonsense/ Words. (C-B)- (A-B)— (C-D)- (A-C) = —0.4540.65; 
se, (C-B)— (A-B) —(C-D) + (A-C) = —1.63+0.67; 


Words/Nonsen 
Nonsense/ Nonsense (СВ) (A-B,)—(C-D)+ (A-C) 0.60: 1.00 


For reasons discussed earli 
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ae with 
; ; x e conditions 
One final comparison of interest involves Stage 1. In the co 


s with new 
nonsense responses, there were strong differences between ipae A 
responses and groups who had the first-list responses during tra 
E(Z): Words/Nonsense, I(C-D) + (A-C)] — 1((C-B) + e 035 
2 пе 


= ~ - (A-B,) 
Nonsense/Nonsense, M(C-D) + (A-C)] — 4((C-B) + (А-В) 


282 + 0.49. 


Clearly, it was an advantage for C_B 
they had used in the first list, 
However, current 


es 
espons 
апа А-В, to be able to use the resp 


onse 
ел А vantage of resp 
associationist theory attributes the advantage occur 
familiarity to a definite process of res 


: NON ] to 
ponse learning that is assumec 
as à component of all associ 


„antag 
ative learning. This would produce an а ка 
of having the first-list responses in the conditions with word ae 
although of course the amount of that advantage would be | ipae 
smaller than in nonsense responses and could be counteracted by DA ntage 
from backward associations, The data gave no evidence of any adva 
E(Z,): Words/Words. M(C-D) 4 (А-С)] — С-В) -+ (А-В,)] 


: 
ET 

Nonsense/ Words. 3{(C-D) 4 (A-C)] — 3(C-B) + (A-B,)] 2 
2 2 d wee 

0.03 + 0. 
Discussion and Revision of Earlier Proposals in 
sal. -jf 

. i sie 

We have proposed mechanisms that could produce negative wans е 
either the first or second Stage of learning during transfer, There ape to 
evidence here of Negative transfer in the second stage which we attributt 
interference caused b um 


Y the retrie : is acquired 957. 
ros learning OF List. p ; the amount of одеће ran gis 
M big e АРА thin for A-C, Consistent with the idea that ™ as 
extensive changes in the retrieval Systems are re uired for A-B, to be ™ 
tered, quired for r 

The other potential so 
interference from first-lis 
obtained: James found si 
and Greeno found si 


during 
val system that we 


urce of negati that we hypothesized was 
t encodings о Some evidence for this ~ 
Enificant negative transfer in the first stage of A 
ificant first-s 


| was 
ve transfer 


f stimuli, 


-Br 
Stage negativ а in two A 

: е transfer in as 
conditions. However the amount of Negative transfer in the first stage У 

considerably less than in the Second stage in р 

tions in all of Greeno's о 


.ondi- 
sera i both the A-C and A B, co 
servations, and nly in the case of James 


„ће 
© transfer in the first stage than 1n у 
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secon sabe pns $ PEE 
d. We think this represents a further failure of a prediction implied by 


associationi 

E ca wei iovis that negative transfer should be greater earlier. rather 
Tic Ehe 16 The reason, as we have said before, 1s that negative 
transfer oe v Na stronger associations should be greater than negative 
be indeed ed by weaker associations. and the first-list associations should 
Of se zis Brow’ weaker, if they change in strength at all, during the course 

~ cond-list learning. 

M Pues in this section also contradict our earlier expectation (Greeno, 
O aPolito, 1971) that in the first stage negative transfer caused by 
earlier с Ы encodings of stimuli and responses would be symmetrical. Our 
id the помени clearly overlooked the importance of response familiarity 
We i of nonsense responses, and allowance for that must be made. 
in the fir that the major facilitating effect of familiarity with responses was 
hone irst stage, as would be expected if familiarity with responses facilitates 
ing. This agrees with conclusions obtained in the study of acquisition, 


Particularly in Humphreys” experiment and in its replication, discussed in 
; regarding what should be said 


Cha i 
ab apter 4. There is some uncertainty now а 
about the case of meaningful responses. The lack of difference between 

Id be due to a lack of importance 


conditi 
is idis with old and new responses cou j : 
response familiarity in encoding when responses are meaningful, or it 


could be due to a fortuitous cancelling of the advantage of familiarity with 
а disadvantage of first-list encodings of the kind hypothesized earlier. We 
are inclined to prefer the hypothesis of no effect, since It seems simpler and 


therefore more susceptible to future empirical test. ` f | 
In addition, the general pattern of findings now available, including 


Greeno's experiment reported here and Pagel’s iin ied rein ncn 
га section of this chapter, indicates aye Es puces 

etween С-В : В. and apparently а %0 elween C-D angi eq ue 
esr етан i ped seem to be small in most cases, so that 
Significant effects are often not obtained. However. all the significant effects 
that have been obtained have been in the direction OI певацув transfer in 
conditions having the same stimuli in transfer as in initial training. Thus, 
we are led to the conclusion stated at the beginning of this chapter, that per- 


Sistence in encoding produces negative transfer primarily on the stimulus 
side. 


тн SYNONYMOUS RESPONSES: 


TRANSFER УМ 
MENT 


GOGGIN'S EXPERI 
dith Goggin, who provided the last data we will 


Her experiment used lists of eight CVC-adjective 
resented to a criterion of one perfect trial, and 
the appropriate second list. The conditions 


rateful to Ju 


hapter- 
_B) list was P. 


ed 20 trials on 


We аге g 
discuss in this С 
pairs; the first (A 
all subjects receiv 
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Table 6-16 Analysis of First Stage in Goggin's Data 


Condition 1—5 а E(Z,) 
C- .404 551 2.08 
A-C 358 .267 3.40 
A-B, 211 .298 3.65 
A-B' .704 .265 2.12 
А-В, .549 261 2:73 


to be discussed here were C-D, A-C, A-B,, А-В’, and A-B}. In group AB’, 
each response was a synonym of the response in List 1 that was paired with 
the same stimulus. In group А-В;, the synonymous responses were paired 
with different stimuli from their counterparts in List 1. 

The version of the two-stage model that was acceptable for all groups 
used the restriction e = q, b = d, the same restriction used for Greeno’s 
transfer experiment. In Table 6-16 we give the estimated values of parameters 
that are involved in the first stage of learning, along with the calculated values 
of E(Z,), the expected number of trials in the initial state. Tests of significance 


were carried out for the differences in the values of a. It was found that C-D 
differed significantly 


from all the other groups, р < .01 in all cases, but that 
none of the other four groups differed from each other. Note that this does not 
indicate equality in the total amount of negative transfer in Stage 1. It » 
evident that A-B' had little or no negative transfer relative to C-D іп the 
mean number of trials needed to accomplish the first stage of learning. 
However, this equality resulted from two compensating effects: first. A-B' 
had more items that were through the first stage after the first trial; second. 
those items that re : у 


: Temained in the initial state after the first trial had a rate of 
learning approximately equal to tho: 


Table 6-17 shows the р; estimated for the second stage of 
= 0. The parameter r/(1 — 5) 15 
| х У of completi cond stage 
on the first trial, given that the first stage is pleting the second 


Table 6-17 Analysis of Second Stage in Goggin's Data 
Condition tJ — s) d y ERES 
р E(Zj) 
C-D .849 .536 576 
: 51 

AS 372 .394 174 e vaa 
А-В, 296 330 192. 432 15 
А-В .533 392 .547 408 3:10 
АВ, 1198 280 178 299 aa 
3.82 
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transfer groups: A-B'(p < .05), А-В,(р < :01), and А-В;(р < .01); also, 
А-В’ differed significantly from A-B,(p < .05). 

Goggin's data for C-D. A-C, and А-В, are consistent with the general 
trends that we have found throughout our analyses. The strongest negative 
transfer was obtained for A-B, in the second stage. Negative transfer was 
also obtained, though apparently to a lesser degree, for A-C in the second 


stage, and for both A-C and A-B, in the first stage. 

In addition to confirming earlier results, Goggin's data provide new infor- 
mation about the effects of having responses synonymous with those of an 
earlier list, The most interesting suggestions come from the A-B' condition. 
Tables 6-16 and 6-17, combined with the results of the significance tests 
Mentioned above, indicate that there was negative transfer in A-B' during 
the second stage, and that there was negative transfer for some items in the 
first stage—that is, those items that did not escape the initial state immedi- 
ately. Statistical tests comparing A-B' and A-C on the learning parameters 
Other than (that is, а, с, d, and t — 5» showed no significant difference. 

Owever, when s was added to the set, а clear rejection of parameter invari- 
ance between A-B' and A-C was obtained, x^(3) = 42.1, p < 0l. 

The possibility comes to mind that in A-B', some items benefit from 
а form of positive transfer, at least in the encoding stage. that has the same 
form as the positive transfer discussed in Chapter 5. If the subject recognized 
the synonymy of a List-2 response and the corresponding List-] response, 
that relation could be used as а basis for encoding the pair in List 2. On the 
Other hand, consider items whose relations with 1451-1 items were not recog- 
nized, These would not be comparable with c-D control items, since CD 
items have new stimuli, and A-B' items have stimuli that were used in List 1. 
These items, for which positive transfer fails, should suffer negative transfer 


of the А-В, A-C variety- 
The pattern of results O à 
Story, since the probability of storing 


btained for Goggin's A-B' group fits with this 
a representation on the first trial (1 — 5) 
Was high for А-В’, but in all other respects A-B' was apparently indistinguish- 
able fromthe Ace condition. Note that the facilitation in storage seems not 
to have been followed by very substantial facilitation in learning to retrieve. 
This would be expected if only à fraction of the A-B items transferred posi- 
tively, on the hypothesis that learning to retrieve involves developing a 
System of relationships among Various iteme Im uar 

В! аге less interesting, but seem sensible. None of 


Th s regarding 
the ikea nim А transfer groups—A-B,. А-В; and, A-C—differed 
significantly from either of the others in any of the learning parameters. 
One might imagine that А-В; would include components of processes 
involved in А-С and A-B, transfer, with recognized items like A-B, and 
unrecognized items like A-C. Were this the case, then if A-B, differed 
substantially from A-C, the results for A-B; might be found to share features 
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: : oggin's 
of the other two or to show intermediate values of parameters. pi iie in 
data, A-C and A-B, did not differ significantly in either stage of lea 5 


x eer · from either 
and it is therefore reasonable that А-В; did not differ significantly from 
of them. 


SUMMARY AND DISCUSSION 


We have reported five empirical findings about negative transfer: р 

l. The quantitative effect of negative transfer found in the second E sen 
of learning, as defined by a two-stage Markov model, was uniformly PE ies 
than that found in the first stage. This is firmly established for A-B,. rela 


to C-B, but also seems to hold for A-C relative to C-D. 
2. The amount of negativ 


(relative to C-B) was great 

3. When nonsense respon 
first stage for conditions in 
training and transfer; 


€ transfer in the second stage found in a 
er than that found in A-C (relative to c= ihe 
ses were used, there was positive transfer ve 
Which responses were the same in both initi? 


there appears to have been no positive or negative 
5 ses when the responses were meaningful. +t 
4. When negative transfer did occur in the first Stage, its rather small a 
consistently retarded A-C compared to C-D, and A-B, compared to С-В 


BUR Samet VC EHE 
that is, it seems to have been due to the presence of first-list stimuli in t! 
transfer list that were paired with new responses, 


5. When numerals were used as stimuli or responses, overtraining оп АВ 
caused no measureable increase in negative transfer in an A-B, list. Using 
pairs of adjectives, overtrained Subjects showed More negative transfer in the 
first stage than did subjects trained Only to criterion. With six-item lists 
there was a compensating decrease in Negative transfer in the second stage 
With a longer list of 10-adjective Pairs, Overtraining increased negative trans- 
fer in both stages of learning. x 

In addition to these five fin 
that there was less negative tr. 
than with dissimilar nonsense 
primarily an effect involving 


dings, we have n 


Oted Pagel's (1973) finding 
ansfer in A-B, w 


$ ith similar nonsense stimul! 
Sumuli or with similar word stimuli. This was 
1 the second Stage, where most of the negative 
transfer occurred in the other conditions, 

Finally, we have presented an int 
from analysis of Goggin's data 
differed from A-C only in one 
the first stage of learning on the first Study trial, Thies 
partly a positive transfer and Partly a Negative transfer 
difference occurring in an all-or-none manner. $ 
reported in Chapter 5. Because it is ba 
suggestion must be taken as tentative Гог 


eresting but highly tent 
collected With an A- 
parameter the proba 


ative finding arising 
B' condition. A-B 
bility of completing 
uggests that А-В 15 

Paradigm, with the 
> Similar to the Positive transfer 
sed on a single observation, this 
now, 
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Relati 7 
ation to Associationist Theory 


Е two of these findings differentiate between the cognitive 
fist, That ok de and the associationist theory of interference. These are, 
Г ви oe transfer occurs in the second stage of learning than 
iciae beides » second. that overtraining on the A-B list does not generally 
aser ga AG transfer in A-B,. Both findings seem to contradict the impli- 

interference theory, that associations will cause more negative 


transfer i 
fer if they are stronger than if they are weaker. 
th associative interference theory as 


Th у b 
Ws анн results seem compatible wi 
n TE A * . H . 
erstand it. The greater negative transfer found in А-В, than in A-C is 


io x the response selector mechanism, and we consider it possible 
in а уа similar in function to response-set selection might operate 
Him deh ine although we think the mechanism probably would involve 
Sinan ized representation of relations among pairs rather than simply 
he M E and we call it à retrieval plan. We think it more likely that 
i A тему in A-B, is caused by а need for more extensive modification of 
irst-list retrieval network, but the fact is explained by either theory. 
ae Positive transfer found for C-B and A-B, with nonsense responses 
; 5 nicely with the associationist assumption of a 


response-learning phase 
1 T Perso T s 
n associative learning. The lack of an effect of response familiarity when 
res " rt 
Ponses are meaningful could be 


interpreted as evidence against the 
hypothesis of response learning. However. the possibility exists that the 
advantage of response learning is compensated by a disadvantage in C-B and 
A-B, of interference due to backward associations. 


/ariability Theory 
сет entirely consistent at the level of 
al similarity may have been obscured 


Relation to Encoding V 
Martin's position (1972) and ours 5 
Concepts and principles- The conceptu ; 
by Martin’s labeling his view as a theory of encoding variability, and our 
Use of the phrase “principle of persistent encoding"; these are simply two 
s. Negative transfer occurs (at least in part) because 


Sides of the same hypothesis à (atl | 
Subjects do not vary encodings with complete flexibility. This persistence 
retards learning in some transfer tasks. However in conditions favoring 
changes in encoding. the resulti ability would decrease the amount of 


negative transfer tO be expected. Arms 
Martin (1968) has argued that if two sets of stimuli differ in their ease of 


recoding, there should be less negative transfer found in lists composed of 
pairs with the stimuli that admit easier recoding. That seems a reasonable 
deduction to us- Martin then assumed that nonsense stimuli are easier t 

recode than meaningful words. he derived the prediction that AB. A 
transfer should be greater for lists with word stimuli than with cian 


ng vari 
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stimuli, and he gave evidence supporting that prediction. The empiria 
results presented in this chapter are inconsistent on whether more OT uli. 
negative transfer is obtained with word stimuli than with nonsense di 
We question whether it is possible to compare amounts of negative кон 
meaningfully in conditions involving different kinds of material, au 
the present state of nonquantitative theory about parameters of learn ie 
However, it is customary to compute percentage of transfer relative 10 i 
difficulty of learning the initial list, and that probably provides a eeu 
index that is comparable among conditions. The usual procedure uses €? SP 
subject's first-list performance as a base for that subject's pum 
transfer. We cannot obtain estimates of difficulty in the two stages dr sie 
for individual subjects simply because each individual gives too few observ! 


| 50 
tions. However, we have computed Percentages of transfer for grouP 
subjects, using the formula 


%T, = Zu— Za y 199, 
21 

where Z,, is the value of E(Z,) for the initial А-В list, and 2, is the value of 

E(Z)) for the transfer list. The values obtained Гог the 16 conditions '" 

Greeno’s transfer experiment are in Table 6-18. 

If we take the amount of negative transfer in A-C as the difference between 
"Т for A-C and C-D, then with word responses in both stages of learning 
sfer with word stimuli (+2.1% and — 12.6%) 
se stimuli (—9.6% and —33.4°%). The direction 
h Martin’s assumptions. However, with nonsense 


there was less negative tran 
than there was with nonsen 
of the difference accords wit 


Table 6-18 Percentage of Transfer for Sreeno's Experiment 


imuli а Vn AT: 
а И Paradigm (First Stage) (Second Stage) 

Words Words A-C 

Words Words C-D 2 PE 
Words Words A-B, 58 г. 
Words Words GB 123 3 p 
Nonsense Words A-C 35.6 70. 
Nonsense Words C-D 452 12.6 
Nonsense Words A-B, 30.9 46.0 
Nonsense Words C 51 j^ —11.8 
Words Nonsense A-C 331 39.8 
Words Nonsense C-D 381 21.4 
Words Nonsense A-B, 562 40.0 
Words Nonsense C-B 607 1827" 
Nonsense | Nonsense A-C 28.0 59:3 
Nonsense | Nonsense C-D 220 31.4 
Nonsense Nonsense A-B, 60.4 40.8 
Nonsense Мопѕепѕе C-B 66.7 21.8 


42.2 
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Tesponses ; 
, the amount of negative transfer appears to have been greater in 
4) than with nonsense 


b : 
inis anm word stimuli (~5.0% and 18.6% 
S YR: оса 9.3 0), which seems inconsistent with Martin’s conclu- 
the Niels: on with A-B, transfer is similar in ome respects. If we take 
C8. then tha сн transfer as the difference in %T between A-B, and 
teibwonies наа of negative transfer in Stage 1 was about equal for 
and —20 p^ n j nonsense stimuli, both where responses were words (— 18.2% 
steonid sta 4) and where they were nonsense (—4.5 % and —6.4%). In the 
the ват ge, the pattern was similar to that found for A-C transfer. When 
(ад те аа words, there was less negative transfer for word stimuli 
there an than nonsense (—51.6%), but when responses. Were nonsense, 
(oie 2 negative transfer for word stimuli (—78.0%) than nonsense 
Ad 
RM clear to us that this failure 
as critical evidence against the 


Sho : 
No ue produce decreased negative transfer, 1 
uld predict. The problem is in the assumption that it is harder to find new 


о when words аге used as stimuli. We wonder whether it may 
e ctimes be easier to find new encodings for word stimuli, especially when 
ee ise for mediational and imaginal encodings are included. In any 

Se, we are dubious about compa d on such arbitrary indices as 
Percentage transfer in cases where di es have to be introduced for 


Measurement. 


to corroborate Martin’s finding should 


hypothesis that ease of recoding 


as both Martin’s theory and ours 


risons base 
fferent bas 


ter Estimates 


On 
Inference from Parame 
olito (1971) remar 


Greeno, James, and рар 


fir и 
* St stage of learning, estim 
umber of errors occurring before t 


ked that the duration of the 


ated by Е(21). 5 “highly correlated” with the 
he first response. This was intended as 


à didactic remark, to provide intuition regarding the meaning of the statistic. 
Postman and Underwood (1973) have taken the correlation rather more 
Seriously than we suspect it should be taken, and have presented observed 
Mean trials of first correct responses from some previously published experi- 
Ments that show substantial differences between С-В апі А-В, іп the number 
Of errors before а correct response Occurs: The statistics are accompanied by 


the following remarks (among others): 
Since the measures considered by oir? et al. were fragmentary and based on 
Parameter estimates 0 known validity, iex gens in Table 2 more direct 
measures of the index said to be highly ind та with the duration of the first 
stage, viz. the number of errors before the first correct response to an cea 
(Postman & Underwoor, 73, E. 10 та for whatever discrepancies 
there appear to be between па паре m et al. and ours, we must for 
the present limit ourselves to s n at our measures were determined 

from the actual experimental Э вера топа whereas theirs were parameter esti 

mates. (Postman & Underwoo?, 1973, p. 36) i- 
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Postman and Underwoo: 
deal with most easily is th 
duration of Stage 1 is corr 


d's remarks raise several issues. The one we P 
€ question of what we meant by saying that se. 
elated with the trial of the first correct ge s 
Both are random variables according to the model we use to measure es 
ing, and they are positively correlated. We think the correlation is ie 4 
quite large. The correlation depends on parameter values, and its pigri 
quite complicated for the general model. It becomes manageable for simp en 
Versions of the model, and if P—46s—l-—a ге аб, бе с=<ф 

e = а, the square of the correlation is 


р? (1 — а)(р + ge)? 


x (1 — ap qc) -- à?q(Y — c) 


at we have utilized here satisfied A 
relatively simple expression P DIS 
by James (1968), presented in * 

chapter. Using lues given in Table 6-10, the corte 
between number of trials in Stage 1 and the number of errors before the firs 
correct response were 82, .88, .69, and 84 for conditions A-C, C-D, А-В" 
and С-В, respectively, b 
The theorem that giv ation between trial of we 
correct response and tr nitial state of the Marko 
chain refers to а singl alues, operating in some expel’ 
araa EZ ae Statistics are compared, as we have gee 


a 


? i 
od have compared the trial of f 


n 

these measure the same pent 

Correlation” js inv statis 
indi olved, f one 

to indicate changes p ; ed. Use о 


rrelated, Sy Ose we want to test the 
ohydrate intake j pp Heman 


га 
ate diet, the sige 
his manipulation 5 


index can give misleading results. We do not know th 
and the number of errors before а со 

variables, and it certainly would be 
use the mean trial of the first corre 
and computer time needed to obta 
present the empirical results obtain 
Table 6-19. Asterisks in the table re 


at use of a correlated 
€ extent to which E(Zi 
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Table 6- N 
6-19 Comparison of Statistics for Estimating Effects of Transfer (Greeno's 


Experiment) 
Trials Trials between 

Stimuli R Para- before First First Correct 

W esponses digm Correct E(Z1) & Criterion Е(22) 

Wer ques A-C 194 135" 45 11» 

Wee ee C-D 1.80 1.38 34 ‘85 

Words Words А-В, 1.94 1.50* 76 1.42 

Е С-В 148 124 24 ‘66 
me e ee A-C 3.00 194** 96 2,15 
Sea WOR C-D 2.13 1.65 54 133 
Rees ours Ape — 28 2.08 1.50 275 
See Она С-В 232 145 63 1.48 

Wes Nonsense AC 364 301 ‘98 216 

Meo EE C-D 341 2.78 .60 1.65 
eee Nonsense А-В, 2.88 1.97** 2.03 3.26 

s Мопзепзе C-B 2.01 1.77 43 1.12 

See en” | 1.99 316 
о 5 69 1.60 325 
Onsense Nonsense A-B 3.95 2.89 2.66 4.29 
ue: nu d qq cm 172 3.18 


зза between the conclusions that would be reached using the two 
tics, 
Fi conclusions involve the relative Quits 
stages. It is comforting that in the majority © 
Same conclusion would be reached using either measu У 
а few substantial exceptions: The most striking is the comparison between 
A-C and C-D with nonsense stimuli and word responses. Using the estimates 
Of mean trials before first correct response and mean trials between first 
correct and criterion, А-С is .87 trials worse in the first stage and only .42 
trials worse in the second stage. However, using the estimates of E(Z,) and 
E(Z,), A-C is only .29 trials Worse in the first stage and is .78 trials worse in 


the second stage. In general, the estimates obtained from the model give 
ge. z in the first stage and larger estimates of effects in 


smaller estimates of effects" ; ie 
ihe erena laoet ectly countable statistics. 

The aerenc that seems the two measurements relates to 
theoretical issues in ап important Way: We have argued that the greater 


amounts of negative transfer measured n | 
against associative interference theory» and if the alternative measures shown 


in Table 6-19 were used, OUT evidence would be considerably attenuated. 
This forces the issue of comparative validity of the two statistics. 
It seems to us that Underwood have neglected some elemen- 


Postman and 
tary concepts in statistics when they remarked that their measures “were 
determined from the actual ex 


amounts of transfer observed in the 
f cases, approximately the 
re. However, there are 


perimental observations,” in contrast to E(Z;,) 
AZ 


164 Negative Transfer 


and E(Z,), which are "parameter estimates. 


j i ; all 
" The contrast is nonexistent; а 
parameter estimates are determined from 


i servations. 
actual experimental observatio 


are two important methodological а 
trials before the first correct response oe 
first correct response is much ue 
hat Postman and Underwood referred to wher 


A : « ~ -imental 
they said their measures “were determined from the actual experime 


observations,” 


Possible to bring actual experimen- 
f whether the assumptions are 
Serious for a measure like = 
9 theory is provided to соппес 


with al process. 


а psychologic 


we have two main purposes. We will present 
edures and results of experiments on forget- 
ames (1968) that have not been published 
findings have been summarized (Greeno, 
the Gestalt and information- 


xs this chapter and the next 
t cific information about proc 

ng by DaPolito (1966) and J 
Previously, although their main 


James, & DaPolito, 1971). We also show how the 
Processing ideas of association We have been using can be applied to the 


interpretation of some main facts about forgetting. In this chapter we discuss 
DaPolito's studies and the theory of proactive forgetting. because the theoret- 
ical issues seem simpler. and because the findings ane hypotheses we present 
here will be used in our discussion of retroaction in Chapter 8. 

Suppose a subject learns an association A-C, after having learned A-B 
earlier, The subject’s retention of A-C may be less successful because of the 
earlier A-B learning. Thus, the interfering Efieet ar Peano: A-B operates 
On the association learned later in time, and for this reason it 15 called proac- 


live interference. ОРУТ NT 
There is a difficult problem in distinguishing proactive interference with 
retention, from negative transfer; indeed, we will assert that the two processes 
are not identifiably different. The difficulty arises because tests of retention 
must by definition involve material learned earlier. If retention is poor, it may 
indicate that the т ial was not learned well—if it had been learned well 
enough, why shou be remembered? | 

At the theoretical level, the needed distinctions can be clearly made. We 
propose the following: Let Condition C denote a control condition and 


165 
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Condition I denote а condition where some interference is known kg AN 
We will Say that the two conditions are equal in learning if the same inforn 
tion is stored at the end of training in both conditions. (Note that this iss 
for negative transfer in that it may take longer for learning to occur in e js 
i ; the point of the definition is to specify wha ill 
owledge when learning is complete.) Ne i if 
are equal in retention over a specified interva If 
of information are lost during the interval. of 
5 of information are present at the beginning = 
equal retention has to be defined in relation to a the 


brs eS itions 
equal retention would occur if the two cond 
ements alon 


ally, we will 


ces produced at the time of eat 
ation are in memory when the a 
tformance will be the same. If th 

ual in information available when the test occurs. pe 
i ation to a theoretical function Я 
of information in memory, and een 
ccording to the same function. wit 

to the function, a He 
f interference that they specify. 1t 1" 
Important to be clear about different Possibilities, McGeoch (1942) explained 
©Mpetition. This explanation assumes 
tion in memory for A-C assochiliuns. 

assumes a difficulty in retrieva 
ad been learned previously: 


Р ` 1938 
ad some credence (Wulf, 1938) 
© assimil 


ation of memory traces. 
Creates similar traces 11 
by becoming assimilated 
Tetention of information 


i ndition I than in a cor 
duc a UID here the A-C trace would be 


retained more completely. 
The hypothesis that we thin 


k most plausib 
materials that suffer proactive 


hg le is that A-C items (or other 
erference) are " | 

1 i ; 9t encoded as well at the 

time of learning as are their comparable con ed as we 


trols; there. ccurrence 
» A А $2 x > or arrenc 
of greater forgetting in Condition I js due to & the occu 


à difference in th t and 
ci i ati red in m ; 2 е amount а 
knd of itifonmatian stare ешову when the items are studied Of course, 
any theory that assumed less learning of A-C t s n. 
у у ions the ; Predict less retention: 
thus. associationist assumptions that A. p i 
could also explain greater forgetting of A | 
It would be helpful if it could be determine W 


hich of the three possible 
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loci was res : 
HOSES PUE the detriment in performance that we identify as 
ej the utpat а Unfortunately, we probably cannot. We can observe 
ues eet s eiim processes: learning. retention. and retrieval. A 
иу ace mane ona test between two conditions could be produced 
анаа аке а la 5 It might be thought that ifa list of A-C items was 
Vitae for сэ me criterion of learning in a Condition I (with previous A-B 
E xample) and in a Condition C. then equal storage of informa- 


t Р 

n p achieved. However. that equality clearly need not always be 

items Саб ba possibility is that when criterion is reached. more 

differences ‘a т than the other stay in short-term memory; large 

(певале ES | he amount of information stored in long-term memory about 

tions май Une two conditions could occur. More significantly. representa- 
d in long-term memory may have varying degrees of resistance to 


f я 
E тери 10 interference from later events. | l : 
findiie at Seems unlikely that we can discriminate between theories by 
to See what point interference occurs. other empirical questons do seem 
nose n differentiation. The one we will focus on 15 statistical dependence 
fera je performance between pairs of items that would be expected to inter- 
and Hest other. In the case to be studied in detail here, two Items, A-B 
tems] C, have the same stimulus and different responses. After the two 
have been studied along with a number of other items the items are 
tested, In each test the stimulus А 15 presented and the subject tries to give 
both of the responses. The resu ons permit estimation of the 
Probabilities of recalling both responses. p(B) and P(C). and of conditional 
Probabilities P(C | B) and РСВ). 
F According to most theories. interference between A-B and A-C occurs 
WA Way that should produce à negative dependency between recall of B and 
P(B|C). at least at first 


C. Thus, we should expect p(B|C) to be less than 5 
thought. In DaPolito’s experiments. the surprising result P(B|C) = РВ|О 


Was obtained: that is. recall of the two responses У stochastically indepen- 


dent. There has been a good deal of discussion of the implications of indepen- 
airs of items since DaPolito's results were obtained, 


artifacts in the statistical tests (Hintzman, 1972; 
ions about whether statistical dependens 
dicting various theoretical assumptions 
„ood, 1973). Because the issues are relatively 


Iting observati 


Including questions 
Martin & Greeno. 19 
Should be taken as eV! 
(Martin, 1971; Postmé 


postpone | 
the experimer 


complex. we will 
until we have presented j 
lo give a rough sketch of the general view that leads to the expectation of 
Negative dependency: This rough sketch in fact characterizes the state of mind 
pability th 


s the p 
correctly. 


1 P(C1B) is the probe at deu a given correctly, given that response B is 
given correctly. P(C |B)! robability that response C is given correctly, given that 
given с Ie 


response B is not 
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that was shared b 


—: - were 
y DaPolito and Greeno when DaPolito's experiments 
begun. 


five assumptions. First, there is some mechanism es 
ееп А-В and A-C when both have been pesi. 
Sm is, it can cause A-C to be forgotten more die 

Previously than if A-C has been studied as the P 
mulus A, The second assumption is that A-B asso p 
The third assumption is that on the average. pe 
more likely to cause interference with retention of 3 
Ons. (In some theories, this is the same as the assump 


пег probabilities of recalling B u 
that if more interference occurs à 
& C will be lower, A straightforwar 

C|B) should be less than P(C|B). 


Y of recallin 
usion that P( 


DAPOLITO'S EXPERIMENT 1 


To examine effects of Single Presentations of interfering paired associates, 
DaPolito used à procedure that has been termed а "miniature experiment 
(Estes, Hopkins, & Crothers, 1960). For example, a design consisting of tw? 


reinforced trials followed by two test trials with no knowledge of results may 
be represented by RRIT 1 denotes a single exposure of a stimu- 


study presentation, and T, and "i 
Subject is asked to £l 


r 
5 refers to the sequence of events fo 


й Ы Present у 
involved a modification, Some of the Pairs hag E 


sponding to the А-В, A-C transfer Paradigm, On tests, subj asked 
to give two responses to each stimulus, (This involves v Aen ofa testing 
procedure called modified modified free recall (MMFR), Mun . 1961) 
Although some items in the list Were paireq With only one у My а плака 
were asked to give two responses to all items in Order to saa Sh encing 
their performance by decisions about whether items haq on crane nses 
The need to give bogus responses to te А 


Some items did not seem to disturb 
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Subject m : 
jects, especially since the set of possible responses used was à well-defined 


Set of n А 
umerals specified to the subject in advance. 


Materials and Procedure 
ng of 14 symbol-number 


The learni 

e hse y "- 

arning materials were two lists, each consist! 
lus and response terms, 


a 
ар were three different pairings of stimu с 
The iens used for 12 of the 36 subjects who participated in the experiment. 
to the irae between eight items in List | and eight in List 2 corresponded 
of the su k A-C paradigm; four items were given A-B presentations in both 
ме АЗ lists; the relationship between two items In each list was 
s to the А-В, C-D paradigm. A summary of the list structure 15 


eres in Table 7-1. 

hi а 

1-24 UN set of responses used for 

of y he 16 stimuli were commonly еХр 
ypewriter signs: ampersand, question 


figures: tri 
im Tes: triangle, square; playing card forms: heart, 
ers: sigma, phi, chi; and frequently observed forms: flag, half-moon, 


mu bell, cross. Some of the forms were used by Polson, Restle, and Polson 
Wen ) and are shown in Figure 3-2 in Chapter 3. Photographs of each symbol 
c € centered in the left half of a 5- by 8-inch white card; the numbers were 
entered at the right half of each card. These locations corresponded with 


the display windows of the exposure apP* и Р 
_A given list of 14 pairs was i ects and List 2 for 
SIX subjects. Six random orders were used, each being used for six subjects. 
Or each subject the presentation order wa ed with the restriction 
that one-half the items representing each paradigm occurred in the first half 
Of the series of study presentations while the remaining items occurred in the 

last half. | 
Presentation of the two lists was separated by 
each study tria 


a " i 
» portion of the instructions. On у 
was shown for 4 seconds and there was à 4-second i 


ed of the numerals 


erienced symbols. They consisted 
mark, dollar sign; geometric 
club, diamond; Greek 


any two lists consist 


the experimenter rereading 
1 the stimulus-response pair 
nterpair interval. 


Among Items Presented on Successive Rein- 


Table 7-1 Relationships n 
forced Trials in DaPolito's Experiment 1 


Training Number of Instances 


Experimental A г Of . 

iti Paradigm in Lists 

Condition Pi Re 

Two-Reinforcement A-B A-B 4 
Control 

Interference 2 22 8 

RI Control = eh 2 


PI Control 
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Upon completion of the second list, 
tests would be given where only 
Some of the stimuli had two c 
correct response; (3) for those s 
ject was to recall both in 


the subject was instructed that r 
the stimulus terms would be p 
Orrect responses while some had ine 
timuli having two correct responses oe one 
any order; (4) for those stimuli having a ў 
ct was to recall that response and then select à 
85 à second response. ЕСИ 
utes intervened between the end of EAE R 
rocedure. For each stimulus the subject was ae 
9 give two numbers. No knowledge of ИЕ 
X orders of presentation of test stimuli were ee with 
ix subjects. These test sequences were randomized d/or 
other learning presentations Ok for 
ast study trial and th? ae 
S representing each iris ui 
€ remaining items being en 
t block. Items tested in the first half of Test arch 
ce each test sequence was indep : 
ion. Test 2 followed Test 1 afte 


st d. 
-D controls, for Test 
» (35) 4.24, D: «S OL 
at the A-C items 


1(35) = 3.34, р — OL; for Test 2 
The data in Table 7- 


-etained 
may have been retain 
Table 7-2 Proporti 


ons of Correct Fir: 


eti in 
DaPolito's Experiment 1 St List (Ry) апа Second-List (R,) Responses 
Paradigm 
м a d MMER Tes 1 MFR Test 2 
Л 4 
Correct с 
" Orrect "m "rect 
First-I js. Correct Correct 
br : Seconq.r js, First-List Second-List 
SPONSE esponse esponse Response 
Two-reinforce- A-B A-B 57 E 
ment Control -736 => 
Interference A-B A-C 534 34g 
RI Control А-В — -542 Fig -552 .288 
PI Control CI — .583 -556 — 


— .569 
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Ti " 
able 7-3 Proportions of Correct Response on Test 2 Condi- 


tional on Test-1 Performance 


P(C,ICO PCM) 


Condition Пет 

Interference 

(А-В, A-C) First List (B) .961 .082 
Interference 

(A-B, A-C) Second List (C) 776 .037 
Control 

(A-B, —) First List (B) 974 .061 
Control 

(—, C-D) Second List (D) -905 .100 


in other conditions. A 


onditional proportion of correct 
se on Test 1. These values are 


less 
we n x 
ll between Test 1 and Test 2 than the items 


mo edere 
fiipra an indicator of retention is the ¢ 
denoted MT Test 2 given 8 correc! respon 
given inc (C,|C,) in Table 7-3. Proportions of correct response on Test 2 
table is correct on Test | are denoted РС; | 11). The main result shown 1m the 
grou E the lower value of P(C21C1) for A-C items compared to all the other 

Ps. This indicates that retention was considerably less good for the A-C 


items the 

Th than for the A-B items or for the controls. 

the most important result of the experiment. 
Proportions of joint events for the tests of A- 


lab, 
belled BC refers to the subject giving both the B e 
means that B was given correctly but C was missed. BC means that B 


Was mi = ; 
аѕ missed and C was given correctly. and BC means that neither B nor C 


Was given correctly on the test. The expected proportions were obtained in 
Ps usual manner for testing independence of two variables. The probabilities 
we and P(C) were estimated from the marginal proportions of correct 

Sponses for the two items. ses are statistically indepen- 


d Then. if the respon 
ent, the expected proportions о s should be 


shown in Table 7-4, gives 
B. A-C items. The outcome 
and C responses on à test. 


f joint outcome 


P(B! — P(C)). 


P(BC) = P(B)P(O)- P(BC) = 
(1— PBD) — P(C)). 


P(BC) = (1 ~ P(B))P(O)- P(BC) 
For example, for Test | in Table 7-4. the marginal proportion of B is 534, 
utcomes on Tests of A-B, A-C Items 


ns of Joint O 
umption of Response Independence 


Table 7-4 Proportio 
ted from ASS! 


and Proportions Ехрес 
MuR a ЕЕ eae ee 
Е РОВС) РВС) 
UR ANTE PCY o 
Test 1, Observed .201 
Test 1, Expected :182 
Test 2, Observed d 


Test 2, Expected nee 
E uo s = 
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ortion 
and the marginal Proportion of C is .340. This gives the expected [y .660 
of BC as .534 x 340 — -182; the expected proportion of BC is . thesis of 
= 352, and so on. As is clear from looking at the table, the н) for the 
independence cannot be rejected in these data. The statistics obtaine 


st test, and 
chi-square tests of independence were X70) = 1.69 for the first 
X*(1) = .09 for the second test. 


The chi-square test assum 
there were eight items in th 
independence can influence 


М r the 
: the largest value of chi square was obtained fo 
Best subjects (1) = 

that the independence о 


ing across Subjects whose variou 


Discussion 


ing 
eas NOt test A-B prior to present it 
linen Interference may not be as anomalous 25 


Table 7-5 Proportions of Joint о 


Utcome of 
Subjects, and Proportions Expected from R wrth at a Three Subgroups 
endence 
wo P(B) P(BC)  P(BO) 

12 Best Subjects, Observed 344 

12 Best Subjects, Expected 320 p 11S 

12 Medium Subjects, Observed .146 p .139 T 

12 Medium Subjects, Expected .145 459 .094 [^^ 

12 Worst Subjects, Observed 115 “188 095 ja 

12 Worst Subjects. Expected .098 205 .208 48 


Forgetting: Proactive Interference 173 


ien UN hand, the absence of retroactive interference makes a cleaner 
involved egarding inferences for theory. DaPolito 5 experiments apparently 
ciel eae interference without concomitant retroaction. Thus, the 
the effet payee of forgetting can be used here to make inferences about 
operatin 9 А-В оп retention of A-C, without uncertainty about effects 
The аа the Opposite direction. | 
Meus erence in performance between the A-C items and the C-D controls 
Hiat peas due in part to A-C items being learned less well. We know from 
rhe e на including those described in Chapter 6, that A-B learning 
би negative transfer for A-C. In addition to weaker learning (or 
tian реја it, depending on assumptions) there apparently was ny epu 
infer th: -C items than of A-B and control items. We are pes inclined to 
А и the proactive interference produced by A-B includes a чис ре 
retai ing function for A-C items; the information stored about A-C is 
ained less well in memory because of the previous study of A-B. 
hon Surprising finding of response independence in MMER has eee 
М Plications for theory that we have already noted in a general way, an wi 
Omment on in more detail later. Although DaPolito failed to find evidence 
of artifacts when he divided subjects into groups of more nearly homogeneous 
ability, the potential importance of the finding of independence seemed to 
require a direct experimental test. This requirement. Was met in the next 


Experiment. 


DAPOLITO'S EXPERIMENT 2 

. The results of Experiment | seem to support the inference that the interfer- 
ing effect of an A-B association is independent of the strength of that associa- 
tion—that is, the results contradict the idea that stronger associations should 
: ve t of that idea 15 provided if A-B 


ave greater interfering effects: A strong tes ди $ 
yin strength by receiving varying numbers of 


associati Р 

ations are caused to var’ å : ma 2 

Presentations. If there i5 equal retention of A-C in conditions where A-B 

Varied in the number of study presentations, the empirical support for the 
hened considerably. 


hypothesis of independence would be strengt 


Method 
à ; i d. The training se 
As in Experiment 1, а mixed-list design was use - E Sequences 
n Experiment ', re shown in Table 7-6. The 


Used for experimental and control conditions à vxo $ я 
letters R. R,, Вз, and R, represent cycles of study trials in which subjects 
„ Ras Ra 


received paired presentations of stimulus and response terms. The two succes- 
Sive tests (T, and T2) that followed the completion of R, were unpaced 
MMER-type recall tests 25 employed in Experiment 1. 

Each ow OF Table 7-6 describes the sequence of paired-associate items 
Presented over the four reinforced trials for each experimental and control 
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Table 7-6 Sequences of Stimulus a 


" ials for 
nd Response Changes over Reinforced Tri 
Items in Each Condition of DaPolito's 


Experiment 2 


Number of. 
Condition 


"as dit Isl 
Spacing Training Paradigm ArT SESE 

Rit Ra Bo R 
3-OL Interference A-B A-B A-B A-C 3 
3-OL Control A-B A-B A-B C-D 1 
2-OL Interference I е-В А-В А-В А-С 1 
2-OL Control I e-B А-В А-В C-D 1 
2-OL Interference II А-В е-В А-В A-C | 
2-OL Control II А-В eB А-В CD | 
2-OL Interference Hr А-В А-В е-В A-C 
2-OL Control HI А-В А-В е-В C-D н 
1-OL Interference I е-В e-B А-В А-С l 
1-OL Interference II е-В А-В г-В А-С | 
1-OL Interference Ш А-В ев e T АЕС l 


А m gous 
ast reinforced cycle (R,) is ru e 
ore conventional homogencous-list | 


irs, Whereas 
D) pairs Occurred 
l conditions recei 
ler items represen 
Sponse) were inclu 


and contro 
OL, the fil 
first-list re 
first-list re 


a 
des 3 5 m menta 
ining cycle (R,). For Se rit 
‘wo paired presentations same 
led by e-B (ie. new stimulus with the = 


А merte 
ded to control for frequency of occurrence 
Shown in Tabl 


ving one or 


shen 
ef condition 1-OL (spacing 11) whe 
i recall of A-B in the l= 11) 
control condition (spatne iD 
iS compared ара nition FOR tree ^ 
his (Spacing II). Also, the 2-OL control (paci, е А-В ed 5 al con- 
trol comparison for the A-B pair in 1-OL зраст i : iun AL. contro’ 
BOUM Servet two functions and were ner a aes nm ditions 
receiving one paired Presentation during original i нр for con 
Items were presented in the same мау as in B. earning, m 
for learning were word-number Pairs (e. 7 em ie The ma eras 
were English three-letter monosyllables taken f, 712). The stimulus 
wood and Schulz (1960). Th 


T г er 
€ numerals 1-30 fes Appendix D of bic 
е used ; si 
responses. There were three arrangements Used as the set of pos 


when the control item e-B 


9! stimulus. s rms an 
each arrangement was used for 20 Of the 69 Given Tesponse terms 


Я гог 
Who participated. FO 
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toate ооа the second-list (A-C and C-D) pairs were used as 

VU d irs for 10 subjects. In this case the response terms for e-B items 

a E NE but the stimulus terms remained the same. 

eee ~ or test items were also the same as in Experiment 1; There 

with the s Џ С for presentation of test stimuli and each was randomized 

er ame restrictions as used in Experiment 1 except that items represent- 
condition were tested equally often in each half of the test blocks 


Over every j 
er every 20 subjects. 


Results 


parison of A-C performance in the 


TI a А 
he main new result involves com 
f A-B presentations. The data 


bugie conditions with different numbers 0! preset 
own in Table 7-7. The results obtained with different spacing of 
Presentations have been pooled: statistical analysis showed differences in 
Performance of A-B items that Were differently spaced. but not on the 
Corresponding A-C items that are the main focus of the analysis. Each pro- 
Portion in Table 7-7 is based on 180 observations. 

Analyses of variance were carried out. and showed that the large effect of 
Number of presentations on A-B performance was strongly significant. 
#02,108) = 57.4, р = nil. As in Experiment 1. there was no evidence that 
'etroactive forgetting occurred: the small difference between A-B interference 
items and their controls was not significant, F(1.54) = 0.13, p > .75, nor was 
the interaction between interference vs. control and number of presentations, 
F(2,108) = 1.51, p > .10. The apparent proactive detriment on performance 
9f A-C was elis bled AUC items were significantly different from the C-D 
Controls, F(1,54) = 339. P = т and there was a significant interaction 


€tween interference vs. control and the test, apparently due to a retention 
loss of the C_D items that was numerically greater than the loss of A-C items 
between the tests, (01.54) = 4.67, p < 05. 


Table 7.7 Proportion of Correct First-List (Ri) and Second-list (R2) Responses on 


Each Test in DaPolito's Experiment 2 
MMFR MMFR 
adi Test 1 Test 2 
Paradigm А 
Condition (Spacings Pooled) pq) LRI РК) Р Ra) 
EE Interference ++ seats oe eo 2 pd 09 
SOL RIC | 3 Ве T sn 
2-OL Hes esed _ A-B. AB. A-C a es po P2 
2-01. RI Controls - AB. A- Bi cD doa 283 -700 = 
-OL РІ Controls в D sam a —— 439 
3-OL Interference A-B A-B А-В ae We x 17 .828 .328 
3-OL RI Controls A-B A-B A-B Ер . AE 744 
3-OL РІ Controls =~ — = ae === 428 
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Table 7-8 Proportions of Јо! 


d- 
Р Indepen 
int Outcomes and Expected Proportions from 

ence 


Number of А-В 


х => x 
Presentations Test P(BC) P(BC) РОВС) Р(ВС) 51 
1 1 Observed — 133 — 36] à 167 = 
1 1 Expected 148 346 152 ie 31 
2 1 Observed — 222 .506 067 =. 
2 | — Expected 210 .518 .079 a 1.22 
3 l| Observed — 278 — 1544 039 85 
3 1 Expected .261 561 056 422 102 
1 2. Observed .139 3322; 161 378 
1 2 Expected .138 323 .162 377 1.63 
2 2 Observed 156 539 .100 205 
2 2 Expected .178 .517 .078 .227 1.25 
3 2 Observed .289 .539 .039 :133. 
3 2 Expected 272 .556 .056 .116 


ally 


nsignificant, F(2,108) = 1.1? 
: nteractions of number of A-B presentations wit 
experimental variables. 


test the 
hich 


А feet - i ndepel” 
“Square test statistic for at 
cement with the indepen 

Table 7-9 Shows the DE M 
f depends Conditional Probabilities 


Test 
of correct response ОП 
est] Performance 
by P(C, | С) again 


jre 
of A-C items as eed 5, 
of A-B items and con re in 
Not as large as they WE 


Table 7-9 Proportions of C 


Оп Test-1 Performance "тесі Response 9n Test 2 Conditional 
Condition Пер 
1 
PC. 

Interference ( 1С) PD) 
(A-B, A-C) First Lj 
Interference Зи) -908 134 
(A-B, A-C) Second Li 
Control mets) 767 .082 
(A-B, —) First List (B) 
Control -888 154 
(—, C-D) Secon 


d List (D) 811 pe 
nn c ЕГ 
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Discussion 

rong support for the conclu- 
ons is independent of their 
difference in the strength of 
the probability of correct 
the number of presenta- 
ith A-C performance 


Fe of this experiment provide very su 
strengths "E proactive effect of A-B associat! 
A-B vei acis must have been a considerable 
response Aem in the different conditions: | 
пе ин as increased from .49 to .82 by increasing 
Was n one to three. Yet the amount of interference w 
ed great in all three of the conditions. 
the "ed the lack of difference between conditions, it is tut surprising that 
ден ти оѓ items within conditions showed that responses were indepen- 
experin rength of A-B associations surely varied among items in a single 
Eres: mental condition, but the amount of variation probably was not as 
Media that produced between conditions. In any case, the results of the 
con tical analysis and the result of the experimental manipulation were 
sistent, 

io of retention obtained by estimati 6 
sexe he conclusion in Experiment І that at least p 
in ad n A-C performance is proactive interference W 
dition to whatever effect there 15 of direct negative transfer. 


ng P(C2. C,) are consistent 
art of the detrimental 
ith retention of A-C, 


PROACTIVE INTERFERENCE IN RECOGNITION 


at the proactive interference 


r supposing th 
associated with the first stage 


T 
here are several reasons fo 


OCcürrine ; А 
Ccurring in DaPolito's experiments should be ; $ 
9f learning rather than the second. First, the effect was obtained when items 


Were given a single presentation. Although second-stage effects could be 
Observed in a single presentation because for some items both stages can be 
accomplished on a single trial, W ngle present 
effects should predominate. 

A second reason is theoretical. 


developed in Chapter 4, the first stag 
Pair in memory. and the second stage 


ith a si ation of A-C, first-stage 
According to the conceptualization we 
e involves storing a representation of a 
involves learning to retrieve the repre- 


Sentation reliably. The retrieval learning that occurs in the second stage 
involves important components of stimulus discrimination and learning of 
relations among pairs in the list. ]t seems unlikely that processes affecting 
tetrieval learning would have very much effect in DaPolito 5 experiments, 
since the sequence of study trials had the character of a series of individual 
items, Subjects probably did not develop very much of a retrieval system for 
any of the items; therefore the difficulty with A-C items should not be due to 

a less efficient retrieval system for those items. 
A third reason for supposing that DaPolito's proactive interference operates 
On storage, rather than on retrieval, is connected with the theory of retroactive 
will discuss in Chapter 8. We will argue that retroactive 


interference that We WINE ior E 
forgetting may be due primarily to interference that occurs between retrieval 
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A-B 
"T А retrieval of the 
systems in which different organizations are required for retrie 


urable 
p TS any meas 
and the A-C items. On this hypothesis, the absence of any 
retroactive interference would be 


ing 
retrieval learn! 
a further indication that eee 
P М ito's ex 

played little or no role in the effects observed in DaPolito’s 

Situation. 


If these arguments and hypotheses 
proactive effects in reco 
those observed in recall. 
tion than a recall test; 
information. If it w 
DaPolito’s other e 
by a recognition m 
of the an 


м to find 
are correct, we should egia as 
gnition memory of the same order of UR informe 
A recognition test requires retrieval of less of store 
it therefore Provides a more direct стеу i 
ere found that proactive effects of the kind obs 


3 then testing Was 
xperiments were greatly reduced when testing 
ethod, we would h 
alysis developed thus far, 


n 


aspects 
А averal asp! 
ave reasons for doubting severa 


Method 


Materials for learning w 
Experiment |, There were 
each cycle. In both 


.vcle 
Не ћ у e first CY 
ddition, there were 6 A-B items in the ! 


ср items 
ear in the second cycle; and similarly. 6 ed as 
in the second cycle had 


These additional 12 items Se нећу 
and proactive interference. controls. resp emet 
sponse terms, each pei 
d in the experiment. Inst! : 
as in Experiments | and force - 
pleted, a two-alternative s wa 
Pair. In each test, a stimulus wa 
merals (both from the set 1-24), of which ops 


ng the 
mi sti 
nitton is that order of testi? ir. 

t 
the first response ошр 


For items in the A-B, 


Of the test block wj Я ding." 
i i w sponding 
pairs tested in the last half; the remainin pe hein uuu 


ast 
7C pairs were tested in epe 1 
n Were si. 1118 А-В pairs in the first М 
ere similarly divided between the two ha 


their Corres 
io 
of the sequence of test trials, 
Prior to the test, subjects were instructeq 
symbol would appear with two numbers- 
number. (2) In every case, only one numb 
to pronounce the correct number pre 
No explicit mention was made of the fa, 


a 
as follows: (1) On every pd 
СОггесі number and one n 
ЈГ Was correct. (3) The subject ~ 
Vious] SSociated with the symbO* 
“I that some symbole had two corre 


One 
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number : RON 3 P 
Una $ and would appear twice in the list. If the subject asked whether 
soem E would appear more than once he was told that this might 
са ach test item was presented for 4 seconds with a 4-second interitem 


Results 
Performance on the recogniti ts is reported in Table 7-10. Performance 
"ais Bi deli was cognition tests is rep а ~: | c 
Teresa ^p МОТКЕ m the second half of the test block for all conditions. 
evidenc ^x significant proactive interference for items tested in both halves, 
half: tid by reliable differences between A-C items and C-D controls, first 
ence b ) = 5.14, p < .001; second half: (44) = 2.67, р < 02. The differ- 
the -— A-B interference items and RI controls in x second half of 
The ; Tack eoi significant. 104) = 1,42, 10 <7 acer EUM 
The r analysis of the hypothesis of independence is given 1n a РЕЋИ 
esult with recognition corresponds to all the previous tests; the data 


Wer E Н | 
€ consistent with the independence hypothesis. 


Discussion 
у Тће experiment showed sizable proactive interference In recognition, 
Onsistent with expectations based on the earlier results. Performance on 


ct Choices in DaPolito's Recognition Experiment 


т 
able 7-10. Proportions of Corre 
Half of 


ct Proportion Correct 


Proportion Corre 
R, Recognitions 


Paradigm "P 
Condition Ri К; Test Block R, Recognitions 
= 2 
Interference A-B A-C First E pe 
Dterference A-B A-C Second ma 681 
енене АВ. АС Раб En i 
| l Control A-B = First 352 
RI Control A-B = Second "863 
Control ASB = Pooled 299 Бү 
Control — -P First "807 
Control so GED: Second 859 
Control = GD Pooled E 
T » int Outcomes and Expected Proportions assuming 
able 7-14 Proportions ol HA OD Experiment 
"dependence for DaPolito $ Recog 
Pair Te 5 ir Tested in 5 = — 
osted in. Pair Tes P(BC) P(BC) P(BC 2 
First Half Second Half (80) DNA ) PBC) a 
- ACA Observed -578 .304 .067 .051 52 
A-B AG Expected .569 313 076 042 
A-B Ав Observed 5570 215 148 — 067 m 
a Ap Expected .564 221 154 — (061 ~ 
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A-B and A-C у 


ance 
^ ur erform: 

as independent, also making the recognition p 
consistent w 


ith earlier findings obtained with recall tests. 


IMPLICATIONS FOR THEORY 
The empirical fact 


6 о 
у P aa У 2 > established. 
mixed-list situation used by DaPolito seems thoroughly est 


ry 
ing theo 

5 А 'eauired regarding 
Sequently, we need to discuss what conclusions are required reg 

of proactive interference, 


In our view, th 


: he 
~ nerformance in ! 
of independence of А-В and A-C performan Соп" 


think that 
bstan- 


sible 
hypotheses that 


include hypotheses of resp 
and interference betv 
We will start by givi 
and then argue that 
we have mentioned, 
The theoretical šes Such as response сотре! accom 
not arise from the finding of independence alone; that fact could be ferent? 
modated if i ant finding of proactive gos that i$ 
is must postulate a реци 
in retention of A-C that was ? polito * 
and A-C at the same time. pa > intel” 
, > especially incisive, partly because there was no retroactive lie 
ference in his experi OWS theoretic: pP га 
å (епі 
д jist 


Я h 

able for explaining proactive forgetting: traces: 
on, assimilation of EE тый. 
at the time of encoding the petitio 
ailed argument against response pim 
considerations apply to the other hyl 


veen associations 
ng а det 
the same 


es 
ition 99 


: you 
e i rence v 
have to have been strongly asymmetric _ nts, such interfe 
with the C response, but not vi 


ce versa response apparently me n 
an interfering effect just because it Was resent роте ouium 
because the interfering response is connected With the RES oe subject 
memory. Thus, some learning of the A-B € stimulus in the $ 
for there to be an effect due tor 

A reasonable expectation mig 


MA ate 
association | с postula 

sponse со Nas to be p 
learned well enough for the B г, 


MPetition 
ht be that j E nh 
i B association had not b 
chosen in a recognition test), t 


Wan A 
ак 10 occur on the MMER test (or t0 P? 
It Would NOt be Strong enough for the 
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TI 

Pistas ew interference. We have worked out the implications of 

PR) — id aPolito's first experiment. For Test 1, the data give the estimate 

just as likel à Suppose that without interference, the C responses would be 

ече y as the B's, that is, P(C |B) = РОВ). (It would be just as reason- 

voda equate P(C|B) with the proactive interference control items, which 
give nearly the same result for these data.) A second item of data is 


th 
at P(C) was .340. By the rule of total probability, 


P(C) = P(C|B)P(B) + Р(С|В)Р(В). 


B) апа P(C), and using the 


Sub: "n à 
Stituting the empirical values for P(B), Р( 
for P(C|B). 


as k 

Sumption that P(C |B) = P(B), we can solve 
340 = (.534)(.466) + (CI в)(.534): Р(СІВ) = .170. 

With values of P(B) and P(C| B) we can calculate theoretical values for the 


Joint outcomes; for example. 
|B) = (.534)(-170) = 091. 


= theoretical values obtained in this way are shown in Table 7-12, along 
к ^ the empirical values repeated from Table 7-4, and goodness-of-fit chi- 
QUare statistics. This hypothesis clearly is well outside the acceptable range. 

The assumption that P(C| B ay that made a correct 


) = P(B) was used in a W с 
Prediction about the amount of negative transfer. 1t made a very inaccurate 
redict; 
Prediction about the dependence be 


tween B and C. Is it possible for a different 
en that still assumes response competition (0 predict independence 
Nd still allow the amount of proactive inte 


rference that was obtained? There 
Seem to be three possibilities: unfortunately, one of them seems very implau- 
Sible ; : 3 
ble and the other two very improbable. 


The implausible solution js to assume 


5 à 
Ufficient associative strength to produce вапататоши» а. 
Чоп. The difficulty is іп assigning equal effect in interference to responses 


that are clearly unequal in other ways—namely. retrievability in a recall test 
i . 
and recede ВИМ ina forced-choice test, On the other hand, it could be 


P(BC) = Р(В)Р(С 


that all the items studied receive 
amounts of response competi- 


Table 7-12 Proportions of Joint Outcomes and Proportions Expected from 
- r 
Assumption that P(C| 8) 7 ) 


Р(ВС) PBC) РВС) РВС) y 

Test 1, Observed .201 5 p^ .326 75.3 
Test 1, Expected .091 2S p 217 

Test 2, Observed 163 ie ў 323 128.3 
у .040 512 248 200 


Test 2, Expected 
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: ja- 
: all increase in assoc 
assumed that a study trial always produces at least a small increase 


2C 15 
: ; ; ; lect of A-B on A 
tive strength, and that the maximum interfering effect of A-B 
achieved with whatever amount is t 


The improbable solutions assume 
does depend on the associative stren 
factor causes the quantitative rel 


а i нетет: 
aken to be the minimum ранк 
that the amount of proactive inter ове 
gth of the А-В item, but that ape 
EU k ` eso 
ationships found in the data. These 5 j 


to cancel the expected dependence. en- 
hat would tend to obscure negative a 
mulus differentiation. It is quite pid 
2 А-В associations also have rela an 
differentiated stimuli, and good differentiation should facilitate vri ме 
hat when two associations with the ua 
can be stored in the same retrieval m hie 
but they will be fou ifferent terminal nodes of the network: re- 
simplest case is diagramed in Fig. 7-1, which shows a feature test that А ге 
sumably is preceded by a number of other tests, but is the last test pue 
identification of two associations, А-В and A-C. Suppose that p 5 ihe 
probability of reaching the feature just above these two associations when n 
subject tries to retri either of the items. Then let q be the probability 


М re, an 
5 ch reached the final feature: was 
y of retrieving A-C, given that the final feature 
reached. 
If separate tests were given for the tw 


eve 


О items we would have 


P(B) = pq, P(C) 


pr, 


Figure 7-1 Portion of a retrieval 


Network show; 3 
links needed to retrieve А-В and A-C items. Owing hypothetical 
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and if i а 
independence of the rests could be assumed, 


P(B (^ C) = Par. 


able to assume that an attempt 


How 
eve r = E 

r, for a MMFR test, it is more reason 
ns. On a single test 


is mad е і 
of io Е to search the network and retrieve both of the iter 
th items, we would have 


РВ A C) = pgr, P(B N О) ра(! — 7). РВ О = Pl — 4)", 
PBA O= = pr - 90 =): 
Е, . 
Tom this it follows that 
P(B) = pa, РКО) = P^ 


MER situation. 
een retention O 
variations in 


a 
pues but P(BC) is pqr in the M 
that Hoo of independence betw 
теа ) remains constant in spite of 
Signe in DaPolito’s Experiment 2, w 
Пее Medea If something like Figure 
Consider A-B and A-C means that pr was 
erably larger for items with more present scordit 
YPothesis of response competition. 7 should become smaller ifg is increased. 
Ш both p and а should increase with the number of presentations of A-B. 


Therefore, + could decrease because of response competition, but an increase 
aving pr approximately constant. The only 
d Ee 


in p 
could compensate for t^: ! 
А ensate for that, te V XA i | 
ificulty with this explanation is the coincidence that must be assumed 
Tegarding the compensating changes The conclusion that r was 
> s. and more falsifiable 


5 A : 
Simply unaffected by changes !^ а is more P: E i 1 
' future research, and thus seems а potentially more productive conclusion. 


The other factor that can be considered in trying to explain independence 
a 


9f A-B and A-C retention is variation of parameter values. The values of 
а f s 3 à 
P, q, and r almost certainly are not constant, despite efforts to construct 
а ) 


relatively homogeneous sets of items and sample subjects from a relatively 


homo у ton: Furthermore. the parameter values for subjects 
geneous populati ary independently. Skillful subjects probably 


and | v 
items probably do ? t Au E : 
aveai ги Пе А all the parameters for most of the items, and 

atively NI а ^ Н x 
Some items with “stimuli easy to relate to responses might have relatively 
hi а eters. 

gh values of all the param ; ; : 6 А 

To analyz effects ef possible intersubject and interitem differences, we 
Must consider parameters as random variables. If the parameters of the 
Process are p. 4 and r. then in an MMER test. 


pr = Bod PPE 


f A-B and of A-C is 
P(B). The finding was 
here variation in P(B) was produced 
7-1 represents the situation, indepen- 
constant, even though pq was 
ations. According to the 


in p and r. 
arsimonirou 


у = E(pr). ЕР(В ^ C) == Е(раг). 
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The possible relationshi 
cated, but our main th 
between q and r, 


li- 
» гегу сотр 
ps among three varying parameters oh dependencits 
eoretical interest here has to do on а о 
so for simplicity, assume that p is a constant. 


E(pq) = pE(q), E(pr) = pE(r), E(pqr) = pE(qr)- 


By the definition of covariance, 


: zP(C)]. 
E(qr) = Cov(q, r) + E(q)E(r) = Cov (а, ғ) + HEPC): EPI Џ 


dent; 
à 5 - У аге indepen 

We are interested in conditions where retention of Band C are ir 

that is, 


„ЕРО? 
EP(B).EP(C) = Е = E(pqr) = p.Cov (q.r) + ri ded 


A few algebraic tra 


nsformations show 
independent if and 


only if 


EP(B)-EP(C) = -5 Cov (q, r). 


i si 

ts, since p?/(1 — p) will be po А 
Probable that the values of p and Cov (4 ) 
ОУ (4, r) should 


hat OL que Just equal the value of E P(B): Ё 
n so many cases 


would be obtained i 
It seems to us th 


t о 
has 
rs 


5 imum 
Атах earning produces mane 
Dd they are Combined with anot 

ation mus. 


ap- 
B t be о precise that the assumP 
In our own search for an explanati 


Я on of DaPolito: 1 ve bee? 
led to the hypothesis of interference base. oñ sti we ha 
presented at the beginning of Chapter 6, W; 


5 : € believe th 
of the stimulus in memory will generally be influe 


e 
us encoding that бА 
at the representat! it 
nced by the subject’s effo 


| 
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ve assume that the encoding 
be a poorer representation 
A-C than would be likely if 


tor t 
А. А, stimulus-response pair. And if w 
for relating ie ois to persist, it will generally 
the stimulus 1 с 10 another response in 
Our bye | not been in the А-В pair. 
implausible ine of persistent encoding fa 
tive ee of response competition, 
Study of the re That is, it explains independence by assuming that mere 
Whether or -B items produces whatever interfering effect there will be, 
think this Boned the study leads to successful learning of the A-B pair. We 
it is terardi Е is more plausible regarding stimulus poe than 
generally mae: ne other mechanisms considered because, un er con ies 
€ easy aft Hb paired-associate experiments, recognition of stimult shou 
and recall e : single trial. To study relations between stimulus recognition 
items tha association, it is necessary to use longer lists and more confusable 
(see Ms n are necessary to make a reasonable task for learning associations 
d rtin, 1967, for an example). Recognition of a MU den. on 
не а encoded representation of the stimulus in memory. e there DS 
the Sup di iae to suppose that, at least with relatively distinctive stimuli, 
odings given stimuli by subjects are likely to persisit and influence 


Te to learn. 

$ Fe be that the empirica 

ниче where A-B items are 

trials Underwood (1949) presen 

of the Per presented an A-C list to a 
-C list was less in condition 


Ils into the same class as the 
trace assimilation, and associa- 


ependence depends on having 
ized to different degrees for 
list for different numbers of 
f one perfect trial. Retention 
s where the A-B list had been studied 


Wing We would suppose that the A-B retrieval system produced interfer- 
Nee with A-C retention. causing more interference in cases where subjects 
н ids better organized for A-B retrieval. Runquist s 0250 results Kx 
st nsistent with this interpretation He ranked the A-B items by their inferred 

Tength, using the number of times they were given correctly during original 


arni -C items on i 
ting, There was no dependence of recall of A ems or the inferred 
g item vith DaPolito's results, and 


Seen of the A-B interferin s. This fits V i 

ares with DaPolito's situation the feature that stronger and weaker items 

Were all presented together. nO allowing the “strong” items to become 
ent retrieval system. 


In n 
legrated into a more coher 


1 finding of ind 
not organ 
ted an A-B 
criterion © 


chapter 8 


hen 
ws 


4 15 tł 
een learned and the subject p? 
: f retention for А-В generally m 
ning has interf, T А-В re 
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stimuli i 

uh а with the A-B list were presented. The other mecha- 
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o retain information about previous responses but still per- 
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The node previously located at that branch is pushed down to the second 
location in the stack, where it is held in storage. 1 
A retrieval network with pushdown storage at each branch is an ee 
flexible system. It is possible for such a system to learn not to respond int Й 
old way before а new element is added. by pushing down the stack and a 
ing a null symbol in the top cell of the slot. Various mechanisms are enn 
for the system to err—one of the easiest to implement would be a probabili и 
of mistakenly retrieving the second item in the stack. Another straightforwat 
system would involve a nonzero probability of losing the item from the pr 
of the stack between the addition of the new branch and the next test of th 


ji А » 3 uilt 
item for which it was developed. This would cause the more recently а 
branch to be lost from memory and the previous branch to occupy the 
position.! 


Figure 8-2 illustrates a network with some pushed-down branches. The e 
Shown has the same stimuli as the one diagrammed in Figure 2-7. but by 
responses have been changed, and the network has been modified mainly 


adding branches just above the terminal nodes. This is the simplest kind © 
modification and is the kind of ch 


«at in the 
ange we think is probably typical in 
A-B, A-C paradigm. * 5 
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then recognize GAC-1. In effect, this technique makes a single retrieval net- 
work for both lists, using features that discriminate between the lists in 
exactly the way that features of items are used. : 

Although we can conceive of situations where list-context information 
would be stored as features to be tested in the retrieval network, that tech- 
nique seems unlikely as a general procedure. For one reason, in the inter- 
polated list, tests for list membership are not needed and their addition to the 
network would add unnecessary complexity to the retrieval structure. 

The second way to incorporate contextual information is to store it when 
the memory stack is pushed down along with the information that specifies 
a feature to be tested or a pattern that is described in the terminal node. Ц 
this were done, the subject would have the information needed to retrieve 
items from the first list, but the information would have to be found by 
searching through memory in a fairly complicated way. Specifically. informa- 
tion would have to be taken out of memory stacks and tested for the contex- 
tual information found as part of the information. This idea is similar to the 
two-stage mechanism that Kintsch (1970) hypothesized for recall, in that It 


involves retrieval of items from memory and an editing to select the items 
that belong in the set being searched for. 


According to this group of ideas, subjects always have the information 
needed for correct performance on a retention test; the problem is whether 
they can find it. On this point we agree with a majorit is Mundi theorists 
(e.g., Anderson & Bower, 1973; Shiffrin, 1970) who DOR argued that the 


а i 5 
е mainly on the complexity of th 
search process required to retrieve the item deeds i 

In most theories, difficulty of search de A 


till another important factor. If the 
ork from the top down, there wil 
tO sections of memory containing 
if the Search process neglects 10 test 
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work have to be used for many of the interpolated Cia га; pa 
connection to "FINAL J?" would be used each Ah ones In Figure ae 
or YEB-BILL was retrieved in the interpolated list IM RIW-STE a 
retrievals are used on fewer occasions—the >t By contrast, lower-leV 
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used only when that individual item is tested. It might be expected, then, that 


е моша build up stronger tendencies to ignore first-list information 
ed at higher levels in the network, making retrieval of LAJ-3 less likely 
than retrieval of items like RZW-5 in many test conditions. 

What do these ideas imply about forgetting in the various experimental 
Paradigms? In the control condition C-D. the usual case probably involves 
developing a new retrieval network for the items in the second list. It is known 
that retention of A-B items is not perfect after C-D interpolated learning. 
The most likely explanation is that some stimulus features shared by the A-B 
and C-D items are used in the retrieval network for C-D after being used for 
A-B. Development of the C-D network would then cause some nodes in the 
A-B network to be pushed down into memory, and retrieval of these nodes 
could be impaired. An interesting implication is that if stimuli in A-C and 
C-D share more features, learning of C-D should cause more forgetting of 
А-В; this agrees with the findings (Gibson, 1940). 

At times, forgetting of A-B is greater after C-B interpolated learning than 
after C-D learning, although the difference is generally very small. Recall our 
assumption that subjects encode а representation of each stimulus-response 
pair, using relational properties of the two elements when they can. This 
assumption implies that when the same responses are used in the two lists, 
Stimulus properties shared between the lists are more likely to be included in 
the encodings. The situation is potentially quite complicated. There can be 
instances where a feature used in identifying an A-B pair will also be useful 
in identifying the С-В pair with the same response. This can lead to facilita- 
tion—both of second-list learning and, we would suppose, of retention. On 
the other hand, it need not. We think facilitation would be produced if the 
shared feature were used to form à grouping of items In the way discussed in 
Chapter 5. Otherwise. the feature would be embedded in a network of feature 
lests, and when the feature has been tested. different consequences would 
generally follow in the C-D network than did in the A-B network. This 
situation would produce pushdown storage of the nodes of the A-B network 
and consequent difficulty of retrieving the A-B item or items involved. It 
must be remembered that difference between retention in the C-B and C-D 
conditions is generally small and often negligible. Retrieval networks consist 
of tests on stimulus features. and when new stimuli are used, the presence of 
old responses apparently has only small siers in ПАШЕ the probability 

à ss s for C-D that were used for A-B. 
of using the same features Chon AB. larine ИЕР 

The amount 0 = g alter ah ae earning should be consider- 
ably greater than in the C-D and C-B conditions, because the same stimuli 
are deed in the two tasks and there will naturally be greater overlap between 
the features used in the interpolated and the A-B retrieval networks. The dia- 
gram shown in Figure 8-2 nana what we propose as a typical result of 
interpolated learning in the A-C network. Most of the retrieval network from 
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the A-B list remains intact and most of the pushdown storage is ag 
low-level nodes. Unlike the situation for C-D and C-B, the А-В КВИН m 
been modified to permit retrieval of new items, and the standard nd hes 
findings regarding forgetting in A-C indicate that subjects do not find i i. 
to retrieve the first-list items after the interpolated items have been inc 
porated into the network. сате 
Note that the network used for A-C in Figure 8-2 uses nearly all the Pid 
stimulus features as were used for A-B items. This is a consequence о bes 
assumption made in Chapter 6 concerning negative transfer—that Pu 2 
subjects will tend to persist in their encodings of stimuli from List 1. ae 
further consequence, the encodings of items in the A-C list will often m 
learning and retention more difficult. Figure 8-3 shows a network for uns 
of A-C items that might be developed by a subject who had no prior exp 2 
ence with the stimuli. Several relations among responses based on phonen 


of 
similarity correspond to the Stimulus features built into the network 
Figure 8-3, and if thes 


retain the list more ea. 
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can sort through the retrieval network and get quite close to the terminal 
nodes where the associations are stored. If the first-list responses are presented 
in the test, they can easily reinstate the last link or so needed to determine 
which response went with which stimulus. However, after the higher-level 
modifications that we suppose occur in A-B, learning, the features used in 
retrieving associations prevent the subject from getting near some of the 
associations, and the presentation of the first-list responses (which of course 
are also the second-list responses) is not а sufficient cue for retrieval of the 


A-B associations. : 
Martin (1972), identified another possible reason for the good retention of 
A-B items on multiple-choice tests. Martin’s idea relates to the hypothesis 


that stimulus features used to represent associations are those that fit into a 
relational unit involving the response. We think that subjects generally persist 
in the initial encoding, but a change in the response paired with the stimulus 
can bring about a change in the features used in the pair's encoding (Weaver, 
1969). To the extent that different features of stimuli are used in the retrieval 
of second-list associations, presentation of the first-list responses should serve 
às reminders of the stimulus features needed to retrieve the first-list items. This 
would be more effective in A-C than in A-B,, because during A-B, learning, 
subjects continue to see the responses previously paired with all the stimuli 
and must repress features used for first-list retrieval to permit successful per- 


formance on the transfer list. 


ASSOCIATIONIST THEORY OF RETROACTION 
associationist theory was strongly 


i I 
In the years preceding World War II. i was 
develo db A merieet functionalist psychologists. These scientists were 
E. d iation as a solution for intel- 


or reciate assoc: 

unw analyze and app! : 

sige e eee puc genesis of abstract ideas. They wanted to under- 
a r ems 


stan; ssociations have for persons. and their theoretical questions 
had bows ir ken ‘tical aspect: Much attention was given to the question of 
a distinc eser i interfer 

m are forgotten. 1 outlines of interference theory 


and the mair 
were developed McGeoch (1942) developed the idea of response competition, 
vere developed. : 
which stated that failu 


d correctly on a test of material learned 

aus у dto the existence of associations other than the one needed, 
earlier is attribute nce among alternative responses. Results obtained by 
and results in ipe inced them that competition among responses 
Melton and Irwin end to explain important aspects of forgetting. and they 
was nota süfficient ost that associations learned earlier would become 
introduced the палета! of an interfering kind had to be memorized. Asso- 
unlearned if later as thus extended to the analysis of forgetting: unlearning 
ciationist theory apetition were the main operative concepts in explaining 
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Recent analyses have considered detailed aspects of associative unlearning. 
Major components of the current associationist theory of retroaction were 
presented by McGovern (1964): thorough reviews and discussions of the 
theory have been given by Keppel ( 1968) and Postman and Underwood (01) 
McGovern’s analysis used the concept of unlearning and dealt with forwar , 
backward, and contextual associations, The main idea is that if there 15 а 
first-list association from x to У, then if x appears in the second-list Su 
Without y, there will be some unlearning of the first-list association. The аи 
likely mechanism for producing unlearning is thought to be elicitation of the 
first-list response during learning of the second list, a condition in which the 
response is not reinforced. 

The theory implies that forward associations from A to B will tend to be 
unlearned during interpolated learning of A-C or A-B,. Backward eager 
tions will tend to be unlearned when the interpolated list is С-В or A-B,- АТ 
contextual associations will tend to be unlearned during interpolated learning 
of A-C or C-D. Assuming that the most important factor in retention is the 


this theory explains why forgetting is greate! 
learning than after C-D or C-B. Whether 
getting depends on the relative importance 
arned in A-C) and backward associations 
! ame is true regarding C-D and C-B inter- 
polated learning. 


implicati ИТ а unit. Postman (1963а) considered 
implications of the response selector mechanism regarding retroactive mwer 
rongly influence the availability © 


COMPARISON OF THEORIES 


In preceding chapters. associationist and 
diverge at numerous points. This has led to discussion of q hat we con- 
clude favor the views we characterize asa Cognitive d зри at S ior 
With regard to retroactive forgetting, our theory has ‘ich | of associ ih 
the associationist theory as that has developed in recent sie. Аан 


Cognitive theories were shown 10 
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things happen in different ways according to the two theories, many of the 


same things do happen. 

According to associationist theory, first-list associations are unlearned 
during second-list learning. According to the cognitive theory we have pre- 
sented, first-list associations are pushed down in memory storage and are 
therefore harder to retrieve. The outcome of these two mechanisms should 
produce similar effects. For example, the idea that unlearning occurs through 
à process analogous to extinction of a conditioned reflex has led association 
theorists to expect spontaneous recovery of unlearned associations. The 
experimental evidence has been uneven, but some studies report an apparent 
increase in strength of first-list items during a retention interval following 


interpolated learning (see Keppel, 1968; Postman & Underwood, 1973). In- 
c lative to second-list items, or even 


crease in performance on first-list items re ms, 01 
absolute increases in first-list performance, would not be surprising 1n the 
theoretical framework we have presented. The simplest hypothesis would 
Suggest rearrangement of the pushdown stacks during retention intervals, 
With items lower in the stack moving up and taking the places of items that 
had occupied higher positions. Indeed, Hintzman (1968) assumed such a 
mechanism in his assumption that responses were stored in pushdown stacks 
at terminal nodes of SAL's discrimination net. ; : 

There is a feature of the cognitive theory that might lead toa differenti- 
ating prediction from associationist theory if the associationist theory did 
not include the idea of response-set interference. Since the pune theory 
assumes that pushdown storage can occur anywhere in 2 es network, 
on some occasions subsets of items should be retrieved ог orgotten as a 


j t paired-associate lists in which 
т ble, then, to construct P 
genug, напой be PR d produce the kinds of clusters whereby sub- 


the relations among items woul : ; 
jects w 45 nd s recall either all or none of the items in the cluster. 
ai would add to our knowledge about the organiza- 


Id not differentiate between cognitive 
and associative theories. because the foreseeable results could also be ex- 
plained using the concept of response-set interference. If subjects do form 
cognitive structures analogous to the list of responses, then ioc formation BE 
those structures probably involves learning sube RPS ot DANA similar 
i f responses that occur during free recall learning (Mandler, 
to the grouping ay and these subgroups would show the kinds of clustering 
1967; dump Ens from the theory of retrieval networks. 
that are predict pecific responses, as with unlearning of 


iti between S 
competition n 
In case of ons of cognitive theory presented here 


У : hat the predicti 

у s, it seems t one ie 

au from those of the associationist theory. The storage of two or 
о no 


siena d ushdown stack makes possible the uncertainty in retrieval, 
more! f such uncertainty would probably be indistinguishable from 


о 
competition between responses. 


Although such experiments i 
tion of associative memory» they мо 
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The main departure between the theories involves the concept of per 
set interference, but even here the ideas have much in common, The sees 
sis of competition between sets of responses provides associationist th he 
with a mechanism that affects subsets of items or the whole list, and Pn is 
mechanism clearly is required. The hypothesis of response-set пир 
consistent with ideas of response learning, and it would be expected t puis 
coherent response pool might be formed as a part of the process of i 
learning during learning of a list. However, evidence presented nga OR 
seems to us to argue agianst the process of response learning as this os are 
developed, and therefore also reduces the plausibility of the idea of respo 
set learning. 


On the other hand, the idea of interference between retrieval plans follows 
directly from the hypothesis that a retrieval netw 
learning to retrieve items from the list. This hy 
evidence of strong effects of stimulus variables 
well as by the finding that most negative trans. 
second stage of learning, 
quired. Thus, it seems to 
plausible as the idea of 
tions assign major signifi 
an important similarity 


ork is acquired by er en 
pothesis is supported by t а 
in later stages of learning, e 

fer appears to operate in t^ 
When the retrieval network is assumed to be ac 
us that the idea of a retrieval network is at least as 
а pool of responses. However, both conceptualiza- 


А ist— 
cance to processes Operating at the level of the lis 
between the two. 


JAMES’ EXPERIMENT 


a 
Chapter 6. Recall tha 


polated learning in four different Paradigms: 
In addition to the groups with interpolated t 
perfect recitation (the data reported in Ch 
paradigm was given just two study > ns. interpolated 
list, after learning the A-B list to criterion, 

Following the interpolated learnin 
measure their retention of the initial 
test in standard form. The subject w Pine wie 
еба голо BOM ists in fandi Order and was asked to write the correct 
response or responses beside each stimulus term. The subject was allowed 
to work as long as she or he thought fruitful, then was asked to Peers У 


& Subjects were giv 
A-B items, The fir 
as given a sheet o 


en a series of tests tO 
St test was а MMFR 
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t she or he could remember. Finally, the 


tional responses not yet written tha 
bership of each stimulus. 


subject was asked to indicate the list mem 

Following ММЕЕ, the subject was given a response completion task. Each 
response from the first list was tested by presenting a frame formed by deleting 
all letters except the first and one other letter. Two different sets of frames 
were constructed, and one-half of the subjects in each paradigm were tested 
On each set. For example, the response MANLY was tested using the frames 


M --- Y and M A - --. 

In a third test, recall of List-1 responses was measured. The subject was 
given a sheet of paper containing the stimuli from the first list and was asked 
to write the correct response from List 1 beside each stimulus. 

The final test in the series was à matching test for the List-1 items. The 


Subject received a sheet of paper with the stimuli and responses from the first 
list and was asked to match each stimulus with its correct response, matching 
more than once. 


all the stimuli and using no response ! 
On all four tests, subjects were encouraged to get a5 many correct as possi- 
ble. There was no time limit on any of the tests. Intervals between successive 


tests were approximately 1 minute. 


Results 
ults is given in Table 8-1. These data are 


from conditions where inter arning was taken to criterion. We will 
discuss the results of each test in some detail: however, note the main finding 
that on the MMER test given immediately after interpolated learning. A-C 
produced considerable forgetting—substantially more than A-B, produced. 

Е responses were recalled by special 


However ven after some of the 
. on a test given à $ : 
prompting, the уза nih did as well as the A-B, group. Finally, in a match- 


ing test the A-C group's performance exceeded that of the group with A-B, 


prre note performance on the first list. since all subjects 
а i RAME items ated learning and different items in the 
ТАВ ee "The meant ‹ erion in Ups four conditions were as 
mlew AR d А 53; C-B, 5.3; C-D, 6.2. The differences in trials to 

5 p t > .10. In analysis of the mean 


criterion were not reliable. 


main res 


A summary of James 
polated le 


Proportions of Items Remembered from First List 


Table 3-1 
MMFR Response List-1 List-2 
interpolate? Test Completion Recall ^ Matching 
.81 98 69 72 
ys 51 EP ло 94 
CB 88 94 89 93 
CE 88 89 93 9 
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number of errors in List 1, the F-ratio was larger, but also not significant, 
F(3, 119) = 2.63, p > .05. dis- 
Differences in difficulty of interpolated learning have already been : 
cussed in Chapter 6. We note here the differences in mean trials to cimo 
among the four paradigms in the groups where interpolated learning bet 
taken to criterion. These constituted differences in the delays between s 
learning and the tests of recall. The mean trials to criterion on List 2 were 
follows: A-B,, 6.7; A-C, 6.1; ECEB 3,1: 5D; 3.7. 


MMEFR Results 


und ing, ап 
Table 8-2 presents the full results of the MMER test. In "strict" scoring, 4 
item was counted correct onl 


à ^ PESE nd 
y if the response was given with its stimulus a 
identified correctly as to lis 


t membership. In lenient scoring, an item. was 
scored as correct if its response occurred anywhere. Substantially higher 
scores for C-B and A-B,, and small increases ranging from .01 to 04 a 
C-D and A-C were shown in lenient scoring, Analysis of variance for ^ 
recall showed significant effects of paradigms, F(3,119) — 8.95, p < 01, я 
the amount of interpolated learning, F(1,119) - 7.13, p <.01, and in the 
interaction between paradigms and amount of interpolated learning. F(3,! 19) 
= 5.00, р < .01. Pairwise comparison carried out using the Newman-Keuls 


test showed that the group receiving A-C training to criterion was poorer 
than any other group, p — .01; 


; = all other differences were nonsignificant. 
Regarding List-2 recall, only the effect of amount of interpolated training Was 
Significant, F(1,119) — 76.7, p < 91. 


Two trials of interpolated learning produced only a modest amount ak 


Е [978 the paradigms. When training was соп" 
tinued to criterion, A-C produce i 


Table 8-2 Proportion Correct on MMER Test 


Amount of 
Interpolated 


List ] P s 

Training Paradigm Strict а, in 
Criterion A-B, 811 
Criterion A-C ‘572 po .994 
Criterion C-B .878 2078 .961 
Criterion C-D .883 889 -978 
Two Trials A-B, .839 28 .989 
Two Trials A-C -839 1872 T 
Two Trials C-B 911 ‘989 -667 
Two Trials C-D .867 878 778 


-756 
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due to a difference in the manner 


Stark, 1969) and the discrepancy is probably 
d significant first-list forgetting. 


e training has produce 1 st 
ject E E een given by presenting one item ata time and requiring the sub- 
giae: je the two responses for that item. With all items presented simul- 

BUSH subjects could write in the responses they knew, then use a process 
of elimination to help sort out responses to other items, changing responses 
to some items if later responses helped them remember items they missed 
earlier, and so on. James results appear to agree with Keppel's (1968) con- 
jecture that A-B retention after А-В, interpolation will vary substantially 
between single-item and whole-list testing procedures. 


Response Completion 
_ Table 8-3 gives full results of the response completio 
sis of the unconditional performance showed that the main effect of the 
paradigm differences and the interaction between paradigms and amount of 
Interpolated training were both significant, F(3.119) = 14,53, p «01; 
F(3.119) — 6.35. p < .01. Newman-Keuls tests showed that the A-C group 
with criterion training was reliably poorer than all other groups, and no 


other pairs of groups differed reliably. 
The two columns to the right in Table 8-3 show performance on the 
onditionalized on whether the first-list response 
the MMFR test. It is interesting that 


response completion test € ! 
Was given correctly (by strict scoring) in У : 
C learning was present for items that 

hat had been missed. It seems 


the decrement in first-list recall after ie insit 
were cor З FR as well as or item | 
i as М ir the B responses that were recalled n MMFR were for- 
gotten before the response completion test: MOE likely hypothesis is. (hae 
some responses that could be retrieved with the stimulus (and possibly the 
could not be retrieved in the absence 
agments given failed to serve 


second-list response) presented as 5 cue pan 
of tha i . we note that the respons ESAE у 
at cue. Finally The A-C criterion group failed to 


É i i ses. 
as reliable retrieval cues for all respon 


n test. Statistical analy- 


formance on Response Completion Test (Test 2) 


Table 8-3 Per 


Amount of 
tn T RON NS IC) РС) 
— АВ, 978 981 941 
Criterion AG AT .825 57 
Criterion C-B 944 937 1.000 
аар cD: 894 931 619 
р А-В, 956 974 862 
Tuis Trials A-C .883 921 .690 
Two Trials C-B 917 :933 750 
Two Trials C-D .894 .949 .542 
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give 33 of the B responses correctly (nearly two per subject) of the 77 that 
were missed on MMER. Although the values of P(C,|1,) are similar for 
the two C-D groups and the A-C criterion group, it should be noted that in 
the C-D groups only a few items were missed in MMFR—21 in C-D crite- 
rionand 24 in C-D with two trials. Thus, there were probably selection factors 


operating rather strongly to make the values of P(C,|1,) as low as they were 
in the C-D conditions. 


Recall after Response Completion 


Table 8-4 gives results of the test for List-1 recall following the response 
completion test. Analysis of variance for the unconditional performance 
gave significant effects of the paradigms, F(3,1 19) = 8.18, p < .01; of the 
amount of interpolated learning, F(1,119) — 15.15, p — .01; and of their 
interaction, F(3,119) — 3.86, р < .05. Newman-Keuls tests showed that the 
A-B, and A-C criterion groups were not different from each other, but both 
were reliably poorer than all the other groups, p — .01. 

Significance tests were applied to determine reliability of change in perfor- 
mance between the MMFR test and the test of recall following е 
pletion. These comparisons were made by simple r-tests on the differences in 
performance between Test | and Test 3 for individual subjects in each group. 
Improvement in performance would be expected in conditions A-C and C-D 
if recall of responses in Test 2 aided recall of associations. Indeed, significant 
improvement was found for both of the A-C groups, p < .01, and for the 
C-D criterion group, p < .05. Surprisingly, there was also significant 


response com- 


` -01. This was almost surely not caused 
by the response completio rather, we think it must have occurred 
tions. In Test 3, subjects were asked only 


A iure Performance on First-List Recall (Test 3) Following Response Completion 
est 


Amount of 
Interpolated 


Learning Paradigm P(C 3) 


РОСС) РС) NI 8 Cy) РС | fy С) 
Criterion A-B, 694 788 294 32 219 
Criterion A-C 700 981 325 44 409 
Criterion C-B 889 975 273 22 273 
Criterion C-D 933 .994 476 13 .692 
Two Trials A-B, .872 947 .483 25 .560 
Two Trials ,A-C .889 .967 .486 20 .650 
Two Trials "С-В .956 .994 562 12 583 
Two Trials C-D 906 968 500 13 692 
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he first list. We infer from their poorer perfor- 
mance that the requirement of giving second-list responses in the MMFR 
test must have had a facilitating effect on recall of first-list responses. 

The second and third columns of data in Table 8-4 show performance on 
Test 3 conditional on whether the correct response was given in Test 1. From 
P(C,|C,), we can conclude that retention was uniformly high—virtually no 
items were forgotten during the interval between the tests. The value of 
P(C,|C,) was low only in the А-В, criterion condition and we believe this 
was probably due to facilitation of performance on Test 1, as we noted above. 

The values of P(C, |1,) all seem too high to be attributed to guessing, and 
it therefore seems likely that there was recovery of associations between Test 
1 and Test 3 in all conditions. This could have occurred because of the 
response completion test, in which recall of responses may have caused asso- 
ciations to be reactivated, improving prospects for recall in the forward 
direction in Test 3. This idea is consistent with the high values of P(C;|l, 
A C,) given in the last column. (The number of cases in individual conditions 
tends to be small: however, with values above .50 in all the conditions with 
two trials of interpolated learning. it seems likely that the high values were 
not due to chance.) A 

Finally, we note that while response recall in Test 2 must have facilitated 
recall of some items in Test 3. especially for the A-C groups, response recall 
did not ensure recall of all items in the recall test. Indeed, in the A-C cri- 
terion group, more than one-half of the items that were missed in MMFR and 
then recalled in Test 2 were ain in the 1151-1 recall test given as 
Test 3. 


to fill in the responses from t 


missed ag 


Matching Test 
are shown in Table 8-5. In analy- 


ained in the matching test 
ps o paradigms, F(3.119) 


The results ob 
ts were obt 


sis of variance. reliable effec ained due t 


Table 8-5 Performance on Matching (Test 4) and Conditional Performance 


on List-1 Recall (Test 3) 


Amount of 
Interpolated 
pct Paradigm Re» Be T m. 

E aaa ae 8 345 

Criterion А-В, ie p^ 815 

Criterion A-C ‘933 988 .500 

Criterion C-B 967 992 583 

Criterion Ep 906 968 478 

Two Trials A-B; d ‘000 1.000 

Two Trials A-C kc Y “500 

Two Trials E ‘972 982 882 


Two Trials 
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= 58.56. p < .01; degree of interpolated learning, F(1,119) = 59.56, p < .01; 
and their interaction, F(3,1 19) = 16.7, p < .01. Newman-Keuls tests showed 
that A-B, with criterion was reliably worse than every other group. p < .01, 
and A-B, with two trials was reliably worse than all the other two-trial groups 
as well as the criterion C-D group, p < .05. 

When a matching test is given, some improvement would occur if no infor- 
mation were added other than the set of alternative answers. James tested the 
hypothesis that the improvement in performance from Test 3 to Test 4 Was 
entirely due to guessing. Suppose that only the items known by the subject 
Were given correctly on Test 3. Then, when Test 4 occurred, the subject would 


have to guess the remaining items. Given n unknown items, the probability of 
matching j of them by guessing is 


3 PET eer lo en: A 1 
PG, п) = (I 1+4 tego) 
(Feller, 1950, p. 97). Let X be the number of ite 


ms matched by guessing from 
n unknown items. It turns out that regardless 


of n, 
Е(Х)= 1, V(Y)— 1. 


Thus, according to the hypothesis, the expected number of correct responses 
on Test 4 should be one greater than the number of correct responses on 
Test 3 (we consider only subjects who missed one or more items in Test 3). 
The standard error of the difference (Test 4 minus Test 3) should equal the 
Square root of the number of subjects having one or more errors on Test 3. 

A statistical test was formed by computing the expected increase from Test 
3 to Test 4, and subtracting that from the observed increase, dividing the 
difference by the theoretical standard deviation. Call this ratio 2. Under the 
null hypothesis that the matching test presented no new effective informa- 


tion, z should be asymptotically distributed as а standard normal deviate. 
In three groups, James found that im 


significantly greater than could be ex 4 I. 
Me NUM Par C olii a БА. ae girly Bee AL ah па 
trials, г: 4.02, p < .01. The C- jals aló газаа вї. 
ficant increment, z = 2.23, р € matching test provides 
iati nd 
that provided by the reinstatement of responses te | 
completion test. 8 
The two columns to the right in Table 8-5 show Performance in matching 
conditional on the List-I recall observed in Test 3. The pets. care 
the low performance in А-В,. especially on items missed но n Ld 
results for С-В are also low. although we hesi st 3. 


tate to draw c ; 
; з onclusions from 
these, due to the small number of items misseq Оп Test 3 in these conditions.) 
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could not be recalled when the stimulus 


Apparently in A-B,, if a response 
fits being remembered when 


was З ’ 
presented, there was relatively little chance o 


the r : vj d 8 
esponse was also presented, as it isin à matching test. 


Discussion 
T cud features of James’ results are consistent with both the cognitive 
terpretation and concepts of associative interference theory. First, the rela- 


tively small amount of forgetting after C-B or C-D interpolated learning is a 
ith the idea that forward associations 


Standard finding. The result agrees W! 
are most important in recalling A-B items, and these associations are lost 
Primarily when stimuli from the first list are present during interpolated 
learning. The result also agrees with the cognitive hypothesis that forgetting 
Occurs because the retrieval network for the A-B items is modified during 
Interpolated learning. The network consists of tests for stimulus features, and 
because the A stimuli are not used in C-B or C-D, it is expected that the 
Network for retrieving A-B should be disturbed much less than in A-C and 


A-B,. 

Another finding that agrees with both theories is that in A-C and A-B,, 

Where substantial forgetting occurred, there was more forgetting when inter- 

polated learning donee continued to criterion than when only two trials of 

Interpolated learning were given. This also isa standard finding (e.g., Barnes 
1 e interference theory by 


& Underwood, 1959) and is explained in associativ | | 
А criterion learning for unrein- 


the gre nities during 

ater number of opportuni g 
forced elicitations of A-B responses; hence, for a greater amount of unlearn- 
f retrieval networks is similar. With only 


subjects will not have developed suffi- 
polated lists. The further training given 
difications of the A-B retrieval network; 


ing. The explanation in the theory 9 
two trials of interpolated learning, 
cient retrieval networks for the inter 
criterion groups led to further pta 
hence, it le ounts O а 
"Bap ted to Bre features in the A-C condition AIV diieu 19 explain 
using the theory of associative and response-set interference. First, in that 
theory most of the poor performance in MMFR would be attributed to sup- 
pression of the first-list responses. Performance on the matching test was 
very good fot ThoA-G subjects- If many of the first-list associations had been 
inleanned matching would not have been successful. It is possible that some 
elie improvement from -C's MMFR performance on the matching test 
ery of unlearned associations. However, since other investi- 
was due to os stark, 1969) have observed good performance on match- 
aes given immediately after A-C learning, we conclude that 
performance on James? matching test probably indicates that individual A-B 
associations were not unlearned to а great extent in the A-C condition. 
A question arises of why there would be so much response-set suppression 
in A-C when in C-D there was only a small amount. (Both conditions require 
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use of a new set of responses in interpolated learning.) In this connection, 
Postman and Underwood (1973. p. 24) said, “The usual difference in reten- 
tion loss between the A-C and C_D paradigms does not in itself bear on the 
question of the relative weight of response-set interference and item-specific 
unlearning in determining RI (retroactive inhibition). The degree of suppres- 
sion of the first-list repertoire and consequent dominance of second-list 
Tesponses are expected to be related to the amount of negative transfer. This 
seems a rather vague hypothesis, but we suppose it means that in A-C the x 
of first-list stimuli keeps the B responses active longer during interpolate: 
learning; therefore, a greater amount of inhibition must be built up to permit 
the interpolated learning to occur. E 
It seems to us that this hypothesis conflicts with the results we reported in 
Chapter 6 regarding negative transfer in A-C. Most of the negative transfer 
(relative to C-D) occurs in the second stage of learning, after the probability 
of the correct response has increased to a nonzero value. If it were assumed 
that the first stage of learning mainly involves acquisiti 
the explanation of forgetting given above would seem 
negative transfer should appear during the first Stage of 
an explanation contrary to the empirical results. 
unnecessarily awkward to assume that forgetting resu 
responses but that the amount of suppression is | 
first-list stimuli are used in the interpolated t 
nious to assume that since stimuli 


on of responses, then 
n to imply that more 
interpolated learning, 
Furthermore, it seems 
Its from suppression of 
argely determined by whether 
ask. It seems more parsino- 

make such a great difference in the amount 
of forgetting, the process of forgetting probably involves something that the 
subject does regarding the stimuli as well as the responses, 

Other aspects of A-C performance also argue against an idea of inter- 
ference that involves just responses rather than the whole associative system- 
On the response-completion test, subjects were able to give .57 of the A-B 
responses they missed in MMFR, The fragments shown to the subjects thus 
provided some assistance in retrieving responses. However, a substantial 
number of responses given correctly in MMER were missed in the comple- 
tion test, P(C,|C,) was .825. The usual concept of response availability 15 
that it operates as a precondition for of an association-that 15 
if the response is available i 


: Я -response connection is retained. 
then the response will be is Баро esponges thi 
were available in MMFR surely should have been available during response 
completion where extra cues were Provided. Thus, it is Surprising that as many 

: А MMER should have been 
missed in response completion. 

Another fact about A-C performance that embarrasses the simple idea of 
response availability involves first-list Tecall after response completion. One 
would think that if response unavailability in its simple form was tlie:causé 
of most retroaction in A-C, then А-В associations should be recalled in 
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nearly all cases where the B response was reinstated by the fragment shown 
in response completion. The fact was that first-list recall was successful in 
fewer than one-half of the cases where an item was missed in MMFR but the 
Tesponse was correct in the response completion test, P(C3|1, A С,) was 
409. 

We consider it more in accord with these findings to conclude that for- 
getting in A-C involves loss of information about the stimulus-response 
connections rather than just the responses. On the other hand, this lost infor- 
Mation must be recoverable, since presentation of the first-list responses 
Provides an occasion where most of the information needed for correct per- 
formance is recovered. We think the most reasonable hypothesis is that when 
both the stimuli and responses are present on a matching test, subjects can 
enter the retrieval network using both the stimulus features and the response 
terms and conduct a search for pairings that were associated in the first list. 
(Anderson and Bower's, 1973. MATCH system provides a mechanism for 
doing this.) The good performance on matching tests after A-C interpolated 
learning is consistent with the expectation that search starting from both ends 
Should lead to success more often than search emanating from a single entry 


point. s 
Finally, we consider performance of subjects in the A-B, UE we 
lake the performance on ММЕЕ as a measure. we must pet that t pe 
K niae iations owever, Іп the 

Was practica inlearning of individual associations. 
: тето PE errors. Some forgetting 


tests of first-list rece hing. there were many 
x st- ecall and matching. : 
may have occurred from Test 1 to Test 4. but forgetting should have been 


relatively slight for first-list associations for which there was already a long 
delay and much interference by the time Test 1 occurred. ire 
The main criterion for an explanation of the А-В, performance 15 т 

associations must still have been in memory. but retrieval chat s = 
required mediation by the pairings used in рдЕ дува aue 
Ber и pon pimp eme ia зв at many nodes 

siderably with pushdown storage creating dE aps dus cR 2 
Suppose the subject initially wrote out the зесопа- 15 С аы = ee m d 

them did). Now assume that a retrieval process begins Е ae uas 
Because the network has been modified for A-B, learning. one or ice 


points are encountered. However. bec " d usd gp ena ven 
already been written down. the subject ca Td rike thé aate 
which path leads to the response alrea “take the alternative path” 

ath. Of course, terms such as ; T 
iio be interpreted metaphorically here—we do not believe ru the tests and 
choices correspond to conscious activity- Howeyels us hypothesis that 
rst-list associations corre- 


ist ri t the fi 
second-list responses were used to sort oul ^^ ani 
ur intuitions and informal observations of subjects in the MMFR 


sponds to 0 
test. 
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i e 
Why would A-B, performance be inferior to A-C performance phe 
matching test? Our best hypothesis is that because each B term E ig Ru 
to just one terminal node in the retrieval network, presentation of the p wei 
terms from A-B gave A-C subjects more useful information. In E У вв 
response term from List 1 also was a response term from List 2, and Wes It 
would have to be represented at two terminal nodes of the final praet а 
would not be possible, then, to simply initiate a network search bot E gise 
stimulus and response term, looking for an intersection path. The s ds 
anchored search would have to start from two places, and this woul 
system less efficient. 5 
du Dein unable to think of any plausible explanation for o 
mance in the A-B, condition based on associative interference E con- 
problem is that for second-list responses to mediate retention of А- s ulus 
nections must have been formed between the two responses for a Е to 
during interpolated learning. It is known that subjects can be поени 33 
learn an interpolated list through mediation, with the first-list respon licit 
intermediate terms. However, such learning generally requires rather exp ase 
instructions and is facilitated by similarity between each second-list ae 
and its mate from the first list (for example, the use of synonyms). In т 
standard А-В, situation, continued use of the first-list response would on 
crease intralist interference for the interpolated learning, and it would see y 
unlikely that subjects would adopt it as a strategy. Consequently, it — 
unlikely that subjects would form the response-response associations tha 
apparently are needed for facilitating first-list retention through mediation. 


INDEPENDENCE OF RESPONSE RECALL 


In Chapter 7 we presented data obtained by DaPolito indicating that. 11 
a MMFR test, recall of the A-B 


and A-C associations are stochastically 
independent. An important feature of DaPolito's results was the absence 9 
retroactive forgetting. Because of this, we are able to draw relatively strong 
conclusions from the independence of responses regarding the mechanism of 
learning that causes negative transfer and proactive forgetting. 
For reasons we will discuss later, the theoretical strength of a finding of 
independence is not as great when re 


troaction occurs, However, independence 
has been tested in several experiments where retroactive forgetting took 


place, and the results have nearly always been consistent with the null hypo- 
thesis of independent recall. 


Koppenaal’s Data 


Results obtained by Koppenaal (1963) were analyzed by DaPolito (1966) 
regarding the hypothesis of independence. Koppen 


aal’s experiment included 
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Table 8-6 Analysis of Independence of Responses in MMFR from Koppenaal's (1963) 


Experiment 


Retention P(correct) É 

Interval in Control P(B) P(C) P(C|B) Р(С| В) ж 
Опе Міпше .990 .663 .944 .934 .963 AS 
20 Minutes .950 .669 .888 .888 .887 .00 
90 Minutes .850 .664 .894 .922 .842 171 
Six Hours 900 744 .850 .815 .951 3.43 
24 Hours .860 .644 .738 738 «737 .00 
Three Days .780 .613 .631 .643 613 .05 
Seven Days .570 .500 419 475 363 1.64 


an A-B, A-C transfer condition as well as а single-list control condition. All 
lists were learned to a criterion of one perfect trial. Recall was tested with a 
MMER test in the transfer condition and a standard recall test in the control 
condition, There were seven different retention intervals for subjects in dif- 
ferent groups. І . 
Table 8-6 shows the results of DaPolito's analysis. Note the substantial 
amount of retroactive forgetting in all seven tests. The performance on A-C 
items was apparently as good as on the control items during tests given the 
Same day as learning. although this probably indicates only that the require- 
ment of learning to criterion forced learning to occur to a level that compen- 
sated for the negative transfer that undoubtedly occurred. Both proactive 
and retroactive forgetting is seen in the tests given one or more days after the 
lists were learned. The results in all the tests were consistent with the hypothe- 


sis of independent recall of responses. 


Postmams Data 
Table 8-7 shows DaPolito’s analysis of Postman's (1964) Сан mam 
NOME месне тела ен A-C and A-B. А-В, training. The sets refer to 
different lists given to different groups of subjects. These data also show 
Strong agreement with the hypothesis of independent recall of responses. 
Table 8-7 Analysis of Response Independence in Postman's (1964) Experiment 
FEN AC)  P(C|B) PEIB Ke 


Paradigm 

458 368 .287 .435 2.76 
AB, ae i 7590 645 635 661 02 
(qa С 3 618 JU 831 .690 3.11 
pis ri 1 465 368 343 428 7 
Xs RES 2 .645 .562 .516 .647 1.79 
Mp es E 645 673 698 627 47 
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Table 8-8 Analysis of Independence in MMER in James’ Experiment 


Amount of 


Interpolated Е = E = x 
Learning Paradigm P(B A C) P(B A €) ABA C) РВ п О) T 

Criterion АВ, AC, Observed — .54 03 42 zn js 

Criterion AB, AC, Expected .56 02 E TD 132 

Two Trials AB, AC, Observed .54 .29 412 E 

Two Trials АВ, AC, Expected 56 .28 M : 


James’ Data 


The data obtained in the A-C conditions of James" (1968) experiment po 
also analyzed regarding the hypothesis of independence in MMFR. T. 
results are in Table 8-8. The data agreed with the hypothesis of independ 
but the tests were not very stringent, due to the high level of A-B recall m 
group with two interpolated trials, and of A-C recall in the criterion group: 
The two-trial group showed no retr 
provides further evidence for inde 
and negative transfer occur. 


3 M ; -formance 
Oactive forgetting, so its ps 
pendence when only proactive forge 


Wichawut and Martin's Data 


A stronger test of response inde 
in Wichawut and Martin's (1971) 
list until the subject had given at 
second list was presented for 12 


7 ; . as given 
pendence involving retroaction was ve 
experiment. They presented a 16-item A- 
least one correct response to every item. 


trials, in which A-C and C-D items were 
intermixed. In it, 4 A-B items had nO A-C items, 4 had A-C items that 


appeared 4 times, 4 had A-C items that appeared 8 times, and 4 had A-C 
items that appeared 12 times. A MM FR test was given after the interpolated 
learning. 


The varying number of A-C presentations produced different amounts of 
A-C recall, as would be expected. The values of P(C) observed were .64. .79, 
and .85, respectively, with 4, 8, and 12 Presentations, In each condition. 
data were consistent with the hypothesis that B and C responses were inde- 
pendent. But the most important finding was that P(B) was not affected by 
the number of interpolated A-C trials. The values of P(B) obtained were .73. 
.62, and .72 with 4, 8, and 12 A-C Presentations, respectively, and the differ- 
ence was not reliable, F(2, 70) — 2.07. р = 16: Ag With: the dim of 
DaPolito (1966) in which number of A-B Presentations did not affect A-C 
recall. Wichawut and Martin's results provide а Case where the independence 
of responses cannot have been a statistical ага 


ў act, since the varying strength 
of A-C was produced by an experimental Manipulation, 
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Discussion 


We argued in Chapter 7 that independence of responses in DaPolito’s 
experiments had a strong theoretical implication—it contradicts any hypothe- 
sis in which the proactive interfering effect of an association depends directly 
on the strength of that association. The situation is more complicated when 
retroactive forgetting is involved. Here we have to consider possible recip- 
rocal effects—effects of the strength of A-B on the retrievability of its inter- 
polated counterpart, and also effects of the strength of an interpolated item 
on the retrievability of its mate from the А-В list. 

We have argued previously (Greeno, James, & DaPolito, 1971; Martin & 
Greeno, 1972) that the concepts of associationist theory should lead to the 
expectation of a negative dependency between recall of B and C responses to 


the same stimulus: that is, 
P(B A С) < РВ): (C). 


stent finding of independence as a contra- 
f unlearning and response competition as 
ment leading to this prediction has some 


elements in common with the position taken in Chapter 7. It might be thought 
that stronger A-B associations would interfere more with the learning of 
their corresponding A-C associations than would weaker A-B associations. 
Conversely, if some individual A-C association were learned rapidly, the 
effect would be to weaken its corresponding first-list association more than 
the average. | 

Postman and Underwood (1973) have argued that there is no reason to 
expect negative dependency between responses on the basis of associationist 
theory. They seem to assume that the existence of an A-B association pro- 
duces no interference with learning an A-C association with the same 
Stimulus. Unlearning occurs because the B response from A-B is elicited 
during interpolated learning, rather than through any interaction between the 
associative connections. Indeed, they derive the prediction that there should 
be a positive dependency between the B and [o iiir for corresponding 
associations. The reasoning 15 that if A-C is d relatively early in inter- 
polated learning, the effect will be to block the elicitation of A-B, and there- 
RET Underwood's analysis puzzling for two reasons. 
First, if the assumption is made that two пати а а stimulus can be 
made as easily as One; then we do not коан why negative transfer 
should occur. Second, the mechanism at ocking elicitation of first-list 
responses through strengthentr. "i pe dtu b howe seems to suggest 
that stronger second-list ‘associations indeed interfere more with first-list asso- 


We have thus interpreted the consi 
diction of the associationist ideas © 
explanations of forgetting. The argu 
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ciations than with weaker ones. On the other hand. since a positive depen- 
dency between individual responses has not been reported, perhaps we should 
conclude that their suggestion of such a mechanism should be considered 
only as a possibility rather than a definite hypothesis implied by the basic 
assumptions of associative interference theory. 

Nevertheless, we think there is a version of associative interference theory 
that is compatible with the finding of independent responses when retroactive 
forgetting occurs. The idea is that stronger first-list associations interfere for 
a longer time with second-list learning: consequently, they lose more of their 
initial strength. The result could be that all first-list associations eventually 
finish at the same strength—or more realistically, finish with a distribution 9 
strengths that is independent of the intial strengths. We conclude, then, that 
independence of responses in MMER is not evidence against associative 
interference theory when retroactive forgetting has occurred. We note, how 
ever, that this version of the theory does imply that stronger first-list associa- 
tions will cause more negative transfer and proactive interference than weaker 


ones, and so is still inconsistent with results reported earlier in Chapters 
and 7. 


CONCLUSIONS 


The concept of a retrieval network provides the b 


active forgetting in which it is assumed that the network is modified by inte™ 
polated learning through addition of pushdown storage with initial branches 
of the network pushed down in memory. This theory explains some facts 
that were previously explained by the hypothesis of response-set interference. 


and the theory of retrieval networks also appears to have some advantages 
regarding details of interrelationshi 


5 А Ps between performance on different 
tests. The theory is consistent with findings of response independence in 
MMER when retroaction occurs, and although We now can see a way tO make 
associative interference theory s 


SANA 2 imilarly consistent with that finding. the 
associationist mechanism we see as a cogent explanation of independence 15 
still not consistent with salient fac 


3 ts about negative transfer and proactive 
interference. 


А apenas 
asis for a theory of ret" 


We have now completed our presentation of evidence against associationist 


theory. Although we have a high regard for this theory, considering it one 
of the strongest intellectual achievements thus far developed in scientific 
and the main content of this book has 


Psychology, we also think it is wrong. ane’ А А 
given the empirical basis for our conclusion that the main assumptions of 


associationist theory аге incorrect. TETE У 

ks against associationist theory. One is based 
on analyses of complex human information processing in the understanding 
of natural language. problem solving. and other complex task environments. 
Theoretical work in these areas has proceeded rapidly in recent years, but 
the concepts of associationist theory appear to be too weak to be of any 


great use in representing the processes that occur. 
A second attack is based on analyses of verbal learning tasks such as 


recognition or free recall. The theories reviewed i n Chapter APO TOI Ne 
вооа understanding of the process of storing information in memory. but 
Econ ende pasic concepts of associationism in a strong way. The 
theories of recognition and recall that seem to be most useful and empirically 
valid are based on concepts of storing and retrieving information, with 
little or no involvement of processes of forming connections between mental 


elements. 
On the ba 

aside as being irrelev 

understand better us! 


There are three main attac 


sis of these two attacks alone. associationist theory could be set 
evant to the analysis of processes that we now find we can 
ng other concepts. Such à judgment would undermine 
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the strong claim of associationism—that formation of undifferentiated con- 
nections between mental elements is the basic process in acquiring knowledge. 
However, it would leave the psychological theory of learning in à rather 
untidy condition. A vast body of experimental work has been developed using 
the task of paired-associate memorizing. This experimental work has been 
interpreted using the concepts of associationist theory and has been seen by 
many as providing evidential support for the assumptions of that theory- 
The evidence we have presented argues against that interpretation and pro 
vides a third kind of attack against associationist theory. We believe the 
evidence shows that associationist assumptions are incorrect, even for the 
experimental task of memorizing paired associates. ; 

The paired-associate memorizing experiment has provided the paradigm 
case of forming new mental connections, and results obtained in that situation 
have provided the main empirical guidance in developing many of associa” 
tionist theory’s concepts. We recognize the difficulties and uncertainties of 
attacking a theory in its own data base; in fact, that has been thought by 
some to be impossible in principle (Kuhn, 1962). Our view is that critical 
discussions of interpretation regarding common data are not only possible, 
but essential in the growth of scientific knowledge. And to set aside a set of 
concepts as important as those of associationist theory without careful Cony 
sideration of the main data on which that theory rests would involve a serious 
intellectual disservice. 

The theory we have proposed as an alternative to associationist theory 
combines a Gestalt interpretation of the nature of association and an assump- 
tion about the process of retrieval based on recent theories of human informa" 
tion processing. The Gestalt idea is that association is one form of cognitive 
organization and that learning a new association consists of forming а new 
mental unit with the associated elements as components. The general form 
ofan association, then, is a relational structure that represents the associate 
elements in an interactive way. The information-processing analysis of retrieV- 
al assumes that to retrieve the learned associations, the subject develops 2 
cognitive system that can be represented as a network of tests of stimulus 
features. Because of the relational nature of the basic associative learning 
the features included in the network will be influenced by the response terms: 
Sufficient features must be included to allow discrimination among stimuli. 
Relationships among different items will be reflected in an organization of 
the retrieval network that permits retrieval to occur in an efficient way. 

We now review the main conclusions regarding applications of this cogni- 
tive theory to various aspects of paired-associate memorizing: 

First, consider response learning and stimulus encoding. Analyses of these 
hypothetical processes have been develope: 


Е ped mainly during the 1960s as 
major amendments to the classical associationist assumptions. The major 
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Phenomena explained by these hypotheses can be as well accommodated to 
а cognitive theory. 

When the response term of an association is complex and unfamiliar to 
the subject, the organized unit that must be stored to represent the association 
Will be harder to achieve than when the response term is already unitary. 
Therefore, it is to be expected that associations with more meaningful 
responses will be easier to learn and that practice, especially on nonmeaning- 
ful responses, will facilitate later paired-associate memorizing. It is not 
Necessary, however, to conclude that response learning occurs as a separate 
Phase of paired-associate memorizing. It seems more appropriate to consider 
integration of the response as part of the process of representing the associa- 
tion as a relational structure. 


Selective encoding of stimulus features is 
a retrieval network consisting of feature tests that are added as needed for 


stimulus discrimination. Thus, findings that subjects can give the correct 
Tesponse when shown some components of a stimulus but not when shown 
Other components are a natural consequence of the assumptions of a cogni- 
tive theory; additional attentional mechanisms or encoding processes are not 
Tequired as they are for associationist theory. Subjects’ tendencies to use 
imaginal encodings and other elaborations of stimulus-response pairs seem 
easily understood when it is assumed that the main process 1n associative 
learning is formation of a relational structure that includes the associated 
elements as components. A pictorial image or a proposition can often permit 
à subject to use relational materials already stored in memory in forming the 


new associative unit. 
In Chapter 4 we presented a 


implied by the assumption of 


nalyses of acquisition of paired associates 


based on а two-stage Markov model of learning. The results led us to conclude 
that the stages of learning can be identified with the two main processes 
Specified in the cognitive theory. The first stage involves storage of a relational 
Tepresentation of the stimulus-response pair. The second stage involves 
inclusion of a pair's stimulus features та retrieval system that permits the 
Pair to be retrieved successfully on tests. Thema expectations that difficulty 
of the first stage should be determined jointly by variations in stimuli and 
responses, both of which affect the difficulty of achieving a memorable 
encoding, but that difficulty of the second stage should be determined pri- 
marily by variations in the stimuli, which affect the difficulty of forming a 
Sufficient retrieval system- ; 

The results were in general agreement with these expectations. Stimulus 
similarity and stimulus meaningfulness both had large effects on the mean 
number of trials needed to accomplish the first stage; so too did response 
meaningfulness, response word frequency, and response pronounceability. 
In the second stage. the effect of stimulus similarity was always at least as 
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large as it was in the first stage, and in most cases stimulus similarity had Я 
considerably larger effect on second-stage difficulty than оп PASAR 
difficulty. Stimulus concreteness had a very large effect on the second sarn 
making learning essentially all-or-none when concrete stimuli were r | 
Stimulus meaningfulness also had marked effects on second-stage diflicu ue 
especially when stimuli were difficult to distinguish or when nonsense ice en 
were used. Response variables, in contrast, generally had either no ager 
relatively weak effects in the second stage. Response pronounceability on 
response word frequency did not affect the second stage significantly, on 
response concreteness had a smaller effect on Stage 2 than stimulus pase 
ness had. Response meaningfulness affected the second stage substant 2: 
when stimuli were not meaningful. To explain this effect we suggest еж 
there may be a kind of induced meaningfulness for nonsense stimuli и 
responses are words involving greater use of meaningful encodings for t! 8 
nonsense items, and that with these encodings it is easier to develop а Sauls 
factory retrieval system. й 
An interpretation of the associationist concepts of response learning. 
stimulus encoding, and forming a connection can be made that fits the main 
trends in our results. According to this interpretation, the first stage of the 
Markov model corresponds to learning the response and forming the 
stimulus-response connection; the second Markov stage corresponds ТЕ 
acquiring a distinctive and stable encoding of the stimulus, This view woul 
be quite similar to ours, especially if it were assumed that response learning 
and connection formation could go on simultaneously, forming а single 
process of storing information about the pair. However, we would still main- 
tain that the information stored about each p 
a relational structure with stimulus and response terms as components, à? 
that the encodings of the stimuli acquired in the second stage should be con- 


ceptualized as a retrieval system, incorporating features that allow discrimi- 
nation and efficiency of recall on tests. 


In Chapter 5 we presented analyses of Positive transfer. Results were CON” 
sistent with the hypothesis that transfer to an individual item occurs in ай 
all-or-none fashion, and that acquisition of a relationship between items that 
can be used for transfer is also an all Ог-попе process, Our interpretation © 
positive transfer is that subjects form relational groupings based on common 
features among items. The idea that these relations are found in an all-or- 
none fashion fits with a strong tradition in Gestalt psychclogy. in that 
apprehending such relations constitutes a form of insight. Earthen the all- 
or-none character of recognition ofa new item's membership in an acquire 
category agrees with the cognitive view of generalization based on attention 
to discrete features and seems inconsistent with the associationist analysis O 
transfer based on generalization of associative Strength. As Restle (1965) has 
shown. theories that assume a continuum of associative strength can be 


à год as 
air should be conceptualized 4 
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made compatible with all-or-none results by making appropriate assump- 
tions about individual differences. However, it seems likely that the assump- 
tions needed to accomodate all-or-none learning of categories and all-or-none 
transfer might become quite complex, while the assumptions needed to 
explain these phenomena in cognitive theory are simple and straightforward. 
In Chapter 6 we used the two-stage Markov model to analyze acquisition of 
paired associates under conditions of negative transfer. Our cognitive inter- 
pretation of negative transfer is that interference with storage of a pair occurs 
because of previous encoding of the stimulus in relation to the response it was 
previously paired with, and that interference with development of a retrieval 
System occurs because features and relationships useful to group items for one 
list of pairs for efficient retrieval will generally not be as useful for organizing 
a second list of pairs. In comparisons between A-B, and C-B, which use 
responses from the first list, the amount of negative transfer occurring in the 
Second stage of learning was much greater than the amount of negative 
transfer in the first stage. In comparisons between A-C and C-D, results 
Were not completely consistent, but most cases studied showed more nega- 
tive transfer in the second stage than in the first. The greater amount of nega- 
tive transfer in the second stage is consistent with the cognitive theory, and 
indicates that the major source of difficulty is developing a modified or new 
retrieval system based on old stimuli. The findings seem incompatible with 
the idea that negative transfer is caused by interference between associations, 
Since that hypothesis seems to require that greater interference be produced 
by associations that have greater strength, and therefore implies that negative 
transfer should be greatest early in the process of learning the transfer list. 
Further evidence against the idea that the amount of negative transfer 
varies directly with associative strength was obtained in comparisons in 
Which overtraining was given for some subjects on the A-B list. In many 
Conditions there was no measurable effect of overtraining on the amount of 
Negative transfer in A- B,. When effects of overtraining were obtained, they 
Were usually small and were located in the first stage. We attribute these 
rather small effects to interference produced by interpair groupings that 


Subjects probably acquire during overtraining. 
When responses were nonsense syllables, А-В, and С-В conditions showed 


Positive transfer relative to C-D, showing the advantage of previous famili- 
arity with responses. However, A- B, always suffered greater negative transfer 
relative to C- B than was observed for A-C relative to C-D. This indicates 
that there is greater interference with the acquisition of a new retrieval system 
when the responses from the first list are present. We take this as support for 
the idea that А-В, requires extensive modification of the retrieval network 
but in A-C more of the initial retrieval system can be maintained. Y 

In Chapter 7 WC EE quii interference. We presented results 
арта пе by Daraltom WART соба l and recognition of the two responses 
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2. modification; for example, a plant is a structure of a particular kind, namely, 
a living structure. 
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А ; ý ants 
fiers to represent information such as plants frequently have leaves, or pl 
are not animals. 


2 ;llian's 
An example of a knowledge structure that can be constructed in Quillia 
model is given in Fi 


ege А animal 
: Living structure that is not an ree 
frequently with leaves, getting its food from air, water, or earth. The diagr 
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Figure 2-1 Knowledge structure for one mea 
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taken from experiments where the task was memorizing paired associates. 
Given the assumptions of associationist theory, paired-associate memorizing 
a as close as one can come to an ideal experimental task for studying basic 
earning mechanisms. Those who agree with our conclusion that memorizing 
paired associates involves storing relational representations of pairs and 
acquiring retrieval networks rather than forming undifferentiated connec- 
tions may also conclude with us that the hypothesis of associationism must 
be fundamentally in error. 
Associationism has been a pervasive and dominant theoretical framework, 
м Its rejection has far-reaching consequences. For one, there is a consider- 
€ implication regarding the probable usefulness of such research as that 
Teported in this book. If one believes that the basic mechanism by which 
humans acquire knowledge is formation of undifferentiated connections 
between ideas, then rote verbal learning, and especially paired-associate 
memorizing, provides the most appropriate task environment in which to 
observe acquisition of knowledge. But if we conclude that the basic mecha- 
ов Bu а formation of relational structure, then there 
ра Ме e questio ^s pice the information we obtain by observing 
isi al learning 15 а У ul in furthering our understanding of basic 
ng processes as are observations that can be made in other task environ- 
m The difficulty is E bue oie ri ie in rote learning is irrelevant to 
at] ry. We have argued t roug out this book that the fundamental process 
earning, namely cognitive organization, is exactly what subjects do when 
they memorize paired associates. Moreover, other theorists (notably Mand- 
ler, 1967) have presented similar arguments concerning other rote learning 
tasks, The difficulty lies in the relatively weak structure of the materials that 
Subjects learn in these tasks. Motivated by the belief that organization is 
ative processes, investigators have tried to mini- 
bjects will find organizing principles in the mate- 
evertheless, subjects do find organizing principles, 
d are relatively weak and extremely varied. If 
n we can study it better by presenting material in 
Which the organization is relatively unambiguous, and by studying the sub- 
Jects’ processes of achieving а representation of that organized material. | 
This conclusion about research in rote verbal learning has been voiced 
many times, by scientists and other individuals who considered the study of 
Р | Р : han the study of basic mechani 
meaningful learning more important t y of sms. 
cht that experimental psychologists were perversel 
Many persons have thought d y 
stubborn in continuing research that seeme unrelated to real problems of 
: f other practical settings where learning occurs 
classroom instruction and © ; I red 
We emphatically reject that view. The most important contribution of scien- 
tists | ke possible an understanding of basic principles, not only 
is to make роз - f increased understanding in i 
because of the human importance o g in itself, but 


а derivative of basic associ 
Mize the possibility that su 
rials used in experiments. N 
but the organizations foun 
learning is organization, the 
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because genuine progress in basic understanding almost always brings Moe 
enormous practical consequences. We believe the protracted study of je 
verbal learning was not the product of a perverse insulation of ати 
rather was the product of an incorrect theory. Given a different understand! " 
of the nature of human knowledge and its acquisition, different die s 
be chosen in which to study processes of learning. Only because t sd 
assumptions of associationist theory has rote verbal learning been sif 
the most likely task environment in which to obtain information i a 
learning, and we consider the work done using that task environment to ha " 
been the soundest work that could have been done. given the available know 
edge and understanding. А cs 
Another implication involves fundamental beliefs about educational pie 4 
tice. If one believes that learning involves formation of undifferentiate? 
connections, then instructional technology requires identifying the сотр Š 
nent connections involved in a skill or set of ideas, then presenting the aes 
that need to be connected in appropriate combinations so that the cod 
tions will be formed. Important qualifications have been recognized, ере 


| А i "e йез се 
cially іп Gagné's (1965) writing, consisting of the need to ensure the presen 
of prerequisites such as discrim 


ег- 
а much deeper and more thorough unde 
ognitive structure than we can obtain W 
Present concepts and techniques 


f pole £ to occur. But if learning is a process 9 
generating structure within an ex 
tion should be paid to the chara at system, and there must 06 
e role in the learning process: 
ding our understanding of the 
nature of human knowledge. Associati ism is required by the form of 
empiricism that asserts that knowledge is derived from experience, We have 
concluded that associationism is Jncorrect —that is, it does not correctly 
describe the basic process by which human beings acquire knowledge. It 
must follow that human knowledge is not derived from experience, but rather 
derives from some important general cognitive Capabilities interacting with 
and growing in response to experience. The nat 
ties is still very much in doubt. Piaget has 
about their characteristics, and he has undou 
questions. but it will be many years before we can hope to achievea satisfac- 
tory understanding of the complex processes and structures that support 
intellectual development. But whatever the base requirements for intellectual 


ure of those cognitive capabili- 
Provided important proposals 
btedly raiseq the right kinds of 
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growth may turn out to be, there are some, and the implication is that innate 
ideas in some form are a factor in the knowledge we possess. 

On the other hand, innate ideas do not determine the nature of our knowl- 
edge. We are able to modify our ideas, and when our experience conflicts 
with expectations that are implied by the ideas we have at the time, we do 
modify them. Experience is not sufficient to produce all we know, but it can 
be sufficient to produce changes in what we know by showing that what we 
thought we knew was false. While we reject empiricism of the extreme form, 
We maintain a very strong form of empiricism in which empirical evidence 
Provides the only legitimate basis for resolving differences of opinion about 
the way things are. Indeed, we maintain that it is on the basis of empirical 
evidence that we have shown that the associationist theory of paired-associate 


memorizing is factually incorrect. 
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